Header Ads

Header ADS

Korean startup targets Nvidia-dominated AI inference market with 2027 chip launch

Hyper Accel CEO Kim Joo-young poses for a photo with the company's first AI chip, codenamed Bertha 500, during an interview with the Korea JoongAng Daily on March 10. [PARK SANG-MOON]

Hyper Accel CEO Kim Joo-young poses for a photo with the company's first AI chip, codenamed Bertha 500, during an interview with the Korea JoongAng Daily on March 10. [PARK SANG-MOON]



[INTERVIEW]
 
A small Korean fabless startup, Hyper Accel, says its first AI chip — designed for language-model inference in data centers — could outperform Nvidia GPUs by up to five times in terms of efficiency when it launches next year.
 
The first samples of the startup’s AI chip, code-named Bertha 500, recently came out, built on Samsung Foundry’s 4-nanometer logic processing and co-developed with Naver Cloud. Hyper Accel CEO and founder Kim Joo-young is targeting mass production of Bertha 500 in early 2027, with initial supply planned for Naver Cloud and potential global Big Tech customers.
 
“We are looking beyond the domestic market and are in discussions with several global technology companies, although those talks are still at an early stage,” Kim said at a recent interview with the Korea JoongAng Daily at the startup’s office in southern Seoul.
 
Following its release, the Bertha 500 will enter the bring-up phase, when engineers activate the hardware and load the full software stack, a process expected to take three to four months. The chip will then be packaged into PCIe accelerator cards to be installed into servers, with following proof-of-concept tests planned for September.
 
“If everything goes well, we will begin supplying the chips to customers and move toward mass production the following year.”
 
The AI chip market is rapidly shifting from training models to running them in real-world applications, known as inference. The shift is represented by Nvidia’s $20 billion investment in U.S. AI chip startup Groq, its largest deal yet, integrating Groq’s processors with Nvidia’s Vera Rubin platform.
 
Hyper Accel CEO Kim Joo-young speaks during an interview with the Korea JoongAng Daily on March 10. [PARK SANG-MOON]

Hyper Accel CEO Kim Joo-young speaks during an interview with the Korea JoongAng Daily on March 10. [PARK SANG-MOON]

 
Kim has been quietly working on similar ideas for several years. In 2022, he presented research at the Hot Chips conference describing a specialized processor for running language models — what is now widely known as a language processing unit (LPU). The concept later gained wider attention when Groq coined the concept as LPU in 2023 to distinguish its chips from Nvidia’s GPUs and Google’s tensor processing units (TPUs).
 
Before founding Hyper Accel, Kim spent nine years at Microsoft, working on chip development at Microsoft Research and Azure. He then returned to his alma meter as an associate professor at KAIST's School of Electrical Engineering in 2019. There, he began exploring how the transformer model — the foundation of today’s AI models — could be optimized with special hardware capable of running large language models (LLMs) end to end.
 
After presenting the research at Hot Chips, AMD engineers approached him to praise the work, saying that he was on the right track, he recalled.
 
“2022 was the year ChatGPT boom was ignited,” Kim reminisced. “I impulsively felt that might be my last chance to start a company based on this research.”
 
Kim founded Hyper Accel in January 2023, raising 55 billion won ($36.9 million) in Series A funding. The company is now undergoing a Series B round expected to close by June. 
 
The startup has about 80 employees, with 65 chip engineers. While working toward Bertha 500's mass production, the same engineering team is simultaneously developing a second AI chip, a much smaller on-device LPU with LG Electronics. The chip, code-named Bertha 100, will be roughly one-tenth the size of Bertha 500.
 
It will enable home appliances to understand and execute human language commands, and could eventually power humanoid robots. Unlike Bertha 500, which is being manufactured by Samsung, the Bertha 100 chips will be produced using TSMC’s 6-nanometer process.
 
The following are excerpts from the interview on Kim’s vision for his company, edited for length and clarity.
 


Q. How do you believe that Alto can top the performances of existing GPUs?
A. GPUs are designed as general-purpose processors, with thousands of small cores that can handle many types of AI workloads, from training to inference. But when running LLMs, this architecture requires constant data transfers between memory buffers and its many small cores, creating large inefficiencies.
 
Hyper Accel takes a different approach. Its LPU chips are built specifically for LLM inference, using dozens of much larger cores that can execute the entire inference pipeline within each core. Data flows through the cores only once in a streamlined pipeline, greatly reducing the back-and-forth data movement common in GPUs.
 
While many AI chips rely on high bandwidth memory (HBM) to overcome memory bottlenecks, Hyper Accel instead uses low-power double data rate (LPDDR) memory combined with a dataflow-optimized architecture. Although LPDDR offers lower peak bandwidth, the design allows the chip to utilize around 90 percent of its available bandwidth, compared to roughly 50 percent utilization in typical GPU systems.
 
By maximizing efficiency rather than simply adding more hardware resources, I believe our LPU architecture can deliver up to five times better tokens-per-second performance relative to costs and resources for LLM inference workloads.
 
Hyper Accel's first AI chip Alto, codenamed Verda, for data centers co-developed with Naver Cloud [PARK SANG-MOON]

Hyper Accel's first AI chip Alto, codenamed Verda, for data centers co-developed with Naver Cloud [PARK SANG-MOON]

 
You originally chose LPDDR to lower costs. With memory prices rising recently, does that make you reconsider the memory strategy for future chips?
We initially chose LPDDR memory because it had a clear cost advantage over HBM. Recently, though, DDR prices have risen quite a bit, so that gap has narrowed somewhat. Even so, LPDDR is still significantly cheaper than HBM, and remains a more practical option when it comes to scaling production.
 
Looking ahead, there are also new memory technologies emerging. One is 3-D IC, where memory is stacked directly on top of the logic chip. Another approach involves stacking flash memory instead of DRAM, sometimes referred to as HB-Flash. These are widely seen as the next directions for memory architecture, and we’re actively looking into both.
 
If we stay focused purely on cost efficiency, the flash-based approach could make sense. But in this industry, there’s also an advantage to moving early on new technologies to capture the market. So we’re also evaluating 3-D IC as part of our longer-term road map.
 
 
How does the shift from training to inference change the AI hardware landscape?
AI is moving from generative AI to agent-based systems and eventually to physical AI, but the core technology behind all of it is still the transformer model. Systems like GPT, Claude and Gemini may differ in details, but they are all fundamentally built on that same architecture.
 
Our chips are optimized for transformers, but they’re still programmable, which means they can adapt as models and software evolve. To make the system easier for developers to use, we also provide a full software stack, including general and model compilers.
 
Another point people often raise is Nvidia's CUDA. Some argue GPUs can’t be replaced because of CUDA’s programmability, and that’s partly true — but mostly in the context of training. Training requires highly specialized programmers, which is why GPUs dominate that area.
 
Inference is a different story. Developers running AI services usually rely on existing frameworks rather than writing low-level CUDA code. For example, they might take an open model from Hugging Face and deploy it using a service framework like vLLM, simply by configuring parameters.
 
In that kind of environment, even developers without deep hardware knowledge can launch AI services. From the perspective of LLM deployment, CUDA isn’t necessarily the standard any more.
 
So our software stack is designed specifically for LLM inference services, so that developers can run models without writing hardware-specific codes such as CUDA.
 
Hyper Accel CEO Kim Joo-young poses for a photo after an interview with the Korea JoongAng Daily on March 10. [PARK SANG-MOON]

Hyper Accel CEO Kim Joo-young poses for a photo after an interview with the Korea JoongAng Daily on March 10. [PARK SANG-MOON]

 
What visions do you have for your company?
When we speak with investors, we often say our initial goal is to capture about 1 percent of the global market. The overall AI accelerator market is currently estimated at around $350 billion, and even if you take a conservative view of the inference chip segment, it’s projected to reach roughly $128 billion.
 
So even a 1 percent share of that market would translate into more than $1 billion in revenue. That’s the scale of opportunity we’re targeting as our first major milestone.

BY LEE JAE-LIM [lee.jaelim@joongang.co.kr]

No comments

Powered by Blogger.