SemiKong: Pioneering the First Open-Source LLM for Semiconductor Industry

Francis Benistant
7 min readJan 5, 2025

--

In January 2025, we just had the opportunity to evaluate SemiKong, the first open-source Large Language Model (LLM) specifically designed for the semiconductor industry. This interesting model, unveiled at SEMICON West 2024, represents a first step forward in applying artificial intelligence to semiconductor manufacturing and design.

The Genesis of SemiKong

SemiKong emerged from a collaborative effort between the AI Alliance, led by Aitomatic, with crucial contributions from Tokyo Electron and FPT Software. Built on Meta’s Llama 3.1 70B architecture, SemiKong was trained on an impressive dataset including:
- 129 books and book chapters
- 708 etching-specific research papers
- 20,000 general research papers
- 50,000 specialized instructions
- A total of 525.6 million tokens

Technical Innovation

What sets SemiKong apart is its Domain-Aware Neurosymbolic Agents (DANA) architecture, which effectively structures expert knowledge and integrates trained Domain Expert Agents with manufacturing execution systems. This innovative approach enables SemiKong to handle complex semiconductor manufacturing challenges with unprecedented accuracy.

Evaluation Framework and Results

As SemiKong is a large model that cannot run locally, I used LMStudio with a GGUF Q5 model to work locally on my Razer 16 (GPU RTX 4090).

The first tasks was to check the datasets available on HuggingFace.

The datatojson.py file converts the dataset into a readable json format file.

The dataset used for SemiKong training has 15000 entries on general semiconductor knowledge

Once the dataset is converted to json, we can browse it to check the topics inside. For that the indexsearch.py can browse the json file, returning the indexes where a specific topic is mentioned.

Checking TSMC related topic into the entries of the json dataset file

After, this first task we can move on the inference to evaluate the quality of the SemiKong model using the SKEval.py script.

This script is dedicated to test SemiKong model on semiconductor specific knowledge with json files as input files. For more details on the SKEval.py file, check the GitHub site.

This evaluation of SemiKong focused on two key approaches:

1. Direct inference through LMStudio

Example of queries input inside LMStudio GUI with SemiKong answer
Example of queries input inside LMStudio GUI with SemiKong answer

As the two pictures show above show it, we have a balanced performance between good and bad results.

As testing with direct inference inside the LMStudio GUI is not the optimum way to evaluate the performance of the LLM, we tried to use json files with specific topics.

2. Structured JSON-based query testing using LMStudio API

The json files are strcutured with an input and an expected output, and we compare the answer of the LLM to the expected answer.

Few issues arised when suing thsi method:

1- The expected answer needs to be as accurate as possible. This leads to the need of defining a set of tests among the semcionductor community to objectively and accurately evakuate any LLM dedicated to the semiconductor industry.

The comparison between the LLM answer and the expected answer is not straight-forward and requires several matching methods, then taking the average of these methods to decide if the LLM passed or failed the test.

We used the Fuzzy matching method, the Keyword matching method, and the Semantic matching method.

Fuzzy Matching

  • Purpose: Evaluates how closely the LLM’s response matches the expected output based on partial string similarity.
  • How It Works:
  • Uses the rapidfuzz.partial_ratio function to calculate a similarity score between the expected output and the LLM's response.
  • The score ranges from 0 to 100, where 100 indicates a perfect match.
  • Threshold:
  • A match is considered a Pass if the similarity score is greater than or equal to the fuzzy_threshold (default is 60).
  • Example:
  • Expected Output: “Electronic Design Automation”
  • LLM Response: “EDA stands for Electronic Design Automation.”
  • Fuzzy Score: 100 (Pass)

Keyword Matching

  • Purpose: Checks if all keywords from the expected output are present in the LLM’s response.
  • How It Works:
  • Splits the expected output into individual keywords (based on spaces).
  • Checks if each keyword appears in the LLM’s response (case-insensitive).
  • Calculates a similarity score as the percentage of keywords found in the response.
  • Threshold:
  • A match is considered a Pass if all keywords are found (implicit threshold of 100%).
  • Example:
  • Expected Output: “Technology Computer-Aided Design”
  • LLM Response: “TCAD stands for Technology Computer-Aided Design.”
  • Keyword Score: 100 (Pass)

RapidFuzz Matching

  • Purpose
  • Measures the similarity between the expected output and the LLM’s response using a fast and efficient string matching algorithm.
  • How it works
  • Normalizes both the expected output and the LLM’s response by converting them to lowercase and removing extra spaces.
  • Uses the rapid_fuzz.partial_ratio method to calculate a similarity score between the two strings.
  • The score represents how closely the LLM’s response matches the expected output, even if the phrasing or structure differs slightly.
  • Threshold:
  • A match is considered a Pass if the similarity score meets or exceeds the predefined fuzzy_threshold (set to 60 in the configuration).
  • Example
  • Expected Output: “Technology Computer-Aided Design”
  • LLM Response: “TCAD stands for Technology Computer-Aided Design.”
  • RapidFuzz Score: 100 (Pass)

CONFIG = {
“api_url”: “http://API-endpoint/v1/chat/completions",
“model_name”: “semikong-70b”,
“max_tokens”: 150,
“temperature”: 0.3,
“fuzzy_threshold”: 60, # Adjust this threshold as needed
“keyword_threshold”: 0.6, # Threshold for keyword-based matching
}

JSON files used for SemiKong evaluation

The results and the scripts are in

https://github.com/FBTCAD/SemiKong-LLM-Evaluation.git

and revealed interesting patterns:

Strengths:
- Excellent understanding of general semiconductor concepts
- Strong performance in detailed technical explanations
- Accurate responses to standard industry terminology

Areas for Improvement:
- Occasional inconsistencies in specialized technical areas, eg, TCAD.
- Variable accuracy in complex manufacturing processes
- Some limitations in understanding advanced industry-specific acronyms

The embbeded file below shows the results of testing specific terms for EDA and TCAD. The overall accuracy is 60% over 50 queries.

As we can see for GDSII, which could be also Graphic Design System II, the answer is wrong but the expected answer could be improved, and for HSPICE SemiKong is right but the exoected answer is wrong! The list was done with another LLM which obviously did not know HSPICE menaing (Hierarchical Simulation Program with Integrated Circuit Emphasis).

This clearly indicates the need to define a set of benchmark tests accepted and validated by the whole semiconductor community.

=== Test Case 1 ===
Category: Terminology Recognition
Input: What does 'EDA' stand for?
Expected: Electronic Design Automation
LLM Response: EDA stands for Electronic Design Automation.
Fuzzy Match: Pass (Score: 100.00)
Keyword Match: Pass (Score: 100.00)
RapidFuzz Match: Pass (Score: 100.00)
Average Score: 100.00
Result: Pass

=== Test Case 2 ===
Category: Terminology Recognition
Input: What does 'TCAD' stand for?
Expected: Technology Computer-Aided Design
LLM Response: TCAD stands for Technology Computer-Aided Design.
Fuzzy Match: Pass (Score: 100.00)
Keyword Match: Pass (Score: 100.00)
RapidFuzz Match: Pass (Score: 100.00)
Average Score: 100.00
Result: Pass

=== Test Case 7 ===
Category: Terminology Recognition
Input: What does 'GDSII' stand for?
Expected: Graphic Data System II
LLM Response: GDSII stands for Calibre Design Stream II.
Fuzzy Match: Fail (Score: 50.00)
Keyword Match: Fail (Score: 25.00)
RapidFuzz Match: Fail (Score: 52.38)
Average Score: 42.46
Result: Fail

=== Test Case 48 ===
Category: Terminology Recognition
Input: What does 'HSPICE' stand for?
Expected: High-Speed SPICE
LLM Response: HSPICE stands for Hierarchy SPICE (Simulation Program with Integrated Circuit Emphasis).
Fuzzy Match: Fail (Score: 56.00)
Keyword Match: Fail (Score: 33.33)
RapidFuzz Match: Fail (Score: 56.25)
Average Score: 48.53
Result: Fail

=== Test Case 49 ===
Category: Terminology Recognition
Input: What is a 'Field solver'?
Expected: A field solver provides a highly accurate extraction of parasitic RC by solving partial differential equations (PDEs) like Maxwell's equations
LLM Response: A Field Solver is a computational tool used in semiconductor process simulation to solve Maxwell's equations and other partial differential equations, enabling the calculation of electromagnetic fields, thermal distributions, and other physical phenomena within a device or system.
Fuzzy Match: Fail (Score: 47.00)
Keyword Match: Fail (Score: 50.00)
RapidFuzz Match: Fail (Score: 57.60)
Average Score: 51.53
Result: Fail

=== Test Case 50 ===
Category: Terminology Recognition
Input: What does 'PE' in TCAD stands for?
Expected: Process Emulation
LLM Response: In TCAD (Technology Computer-Aided Design), 'PE' stands for Process Emulation.
Fuzzy Match: Pass (Score: 100.00)
Keyword Match: Pass (Score: 100.00)
RapidFuzz Match: Pass (Score: 100.00)
Average Score: 100.00
Result: Pass

Accuracy: 60.00%

The Path Forward

SemiKong represents more than just another LLM; it’s a foundation for future development in semiconductor-focused AI. The semiconductor community now faces the challenge of developing standardized benchmarks that can effectively evaluate such specialized models.

Call to Action

The emergence of SemiKong highlights several crucial needs in the semiconductor industry:
1. Development of transparent, industry-specific benchmarks
2. Creation of standardized evaluation frameworks
3. Collaborative advancement of AI solutions specifically for semiconductor applications

Conclusion

While SemiKong shows immense promise as a first-generation semiconductor-specific LLM, it also illuminates the path forward for more specialized AI models in the semiconductor industry. Its development and evaluation framework provide valuable lessons for future iterations and similar industry-specific AI initiatives.

References:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response