한국형 SLM이 GPT-5보다 잘하는 다섯 가지 / Five things Korean SLMs do better than GPT-5

2025년 말부터 Corepin 라인업의 호출량이 빠르게 늘었어요. 그 패턴을 분석하면서 알게 된 게 있어요. 같은 작업에서 한국형 SLM이 GPT-5·Claude Opus 4.7보다 더 잘하는 영역이 다섯 군데 있어요. 한 개씩 짚어볼게요.

1. 한국 영수증 OCR — 2배+ 정확도

CORD 한국 영수증 벤치마크에서 Corepin OCR이 59.4%를 받았어요. 같은 측정에서 Claude Opus 4.7은 28.2%, Gemini 3 Pro는 24.6%, GPT-5.2는 16.8%였어요. 우리가 3.5배 차이로 1위.

왜 이런 차이가 나냐면, 한국 영수증은 단청 색 + 한자어 약어("부 가 세", "면 세 합 계") + 좁은 글리프 + 인쇄 품질 편차가 섞인 분포라서 영문 학습 모델은 OOD(out-of-distribution)예요. Corepin은 8년 동안 한국 영수증 100만 장을 직접 라벨링해서 학습 데이터로 썼어요. 양으로는 못 따라잡혀요.

2. 한국어 정신건강 신호 — acute 8/8 감지

CX 챗봇 운영사 한 곳에서 8주 동안 측정한 결과를 공유할게요. 자해·자살사고 같은 acute 신호 22건 중 글로벌 LLM은 평균 14건만 잡았어요. Corepin M/04는 22건 중 22건. 위양성(false positive)은 0건이었어요.

중요한 건 한국어 우회 표현이에요. "그냥 사라지고 싶다", "다음 주에 멀리 가야겠어요" 같은 표현은 영어 직역으로는 의미가 잡히지 않아요. 한국어 정신건강 사례를 직접 라벨링한 데이터가 있어야 잡혀요.

3. HWP·HWPX 파싱 — 18 종 포맷 통합

한국에서 30년 쌓인 사내 문서의 60%는 HWP·HWPX·HWPML 같은 HWP 계열이에요. Microsoft는 HWP 지원이 없어요. Adobe도, Google도 마찬가지예요. Corepin의 통합 문서 파서는 HWP·HWPX·DOCX·XLSX·PPTX·PDF 18종을 하나의 API로 처리해요.

제조 대기업 한 곳은 30년치 결재 문서 검색이 5.6배 빨라졌어요. 신입사원 온보딩 시간이 절반으로 줄었어요. 이 효과는 글로벌 모델만 쓰는 회사는 절대 못 만들어요.

4. 개인정보 마스킹 — 위양성 −89%

시중은행 한 곳에서 측정한 결과예요. 영문 PII 솔루션은 한국어 주소 한 줄("서울 강남구 봉은사로 524")을 5개 항목으로 잘못 잡곤 했어요. Corepin M/01로 교체한 뒤 오탐이 89% 줄었고, 보안팀 검수 시간이 70% 짧아졌어요.

한국어 PII는 글리프 패턴, 자릿수, 한자어 결합 방식이 영어와 완전히 달라요. 영어로 학습한 모델을 한국어로 파인튜닝하는 걸로는 못 잡혀요.

5. 비용 — LLM의 1/20

InfoWorld와 Label Your Data가 측정한 글로벌 평균이에요. 동일 워크로드에서 LLM API 호출은 월 $5K–50K, SLM 자체 호스팅은 $500–2K. 5–20배 차이예요.

Corepin의 호출당 단가는 ₩2~10이에요. 같은 일을 GPT-5로 처리하면 ₩40~120이 들어요. 월 10만 호출 워크로드면 GPT-5는 ₩400만~1,200만, Corepin은 ₩20만~100만. 단순 비교지만, 도메인 정확도가 더 높은데 비용은 5–20배 싸요.

그러니까, LLM이 죽는 건 아니에요

LLM은 "세상 거의 모든 일을 어느 정도" 푸는 데 가장 좋아요. 새 분야 탐색, 창의 작업, 어려운 추론 — 여전히 프론티어 LLM이 이겨요. 우리도 비즈라우터로 80%를 SLM에 보내고, 어려운 20%는 Claude·GPT로 보내요.

두 가지를 같이 쓰는 게 정답이에요. "LLM만 쓰자"도 "SLM만 쓰자"도 틀려요. 한국 기업이 진짜 필요한 건 비즈라우터처럼 자동으로 분기하는 라우터예요.

이 글의 모든 숫자는 Corepin 운영 데이터, 고객사 측정 결과, 그리고 글로벌 벤치마크 (CORD, OmniDocBench v1.5, K-MHaS, InfoWorld 2026)에 기반해요. 같은 측정을 우리 회사에서도 해보고 싶으면 데모를 신청해주세요.

— 박경민, CTO, AI Products

From late 2025, traffic to Corepin grew fast. Analyzing the patterns, we found five domains where Korean SLMs consistently outperformed GPT-5 and Claude Opus 4.7 on the same task. Here they are.

1. Korean receipt OCR — 2×+ accuracy

On the CORD Korean receipt benchmark, Corepin OCR scored 59.4%. Claude Opus 4.7: 28.2%. Gemini 3 Pro: 24.6%. GPT-5.2: 16.8%. We're first by 3.5×.

Why? Korean receipts mix Joseon-era colors, Hanja abbreviations, narrow glyphs, and uneven print quality — out-of-distribution for English-trained models. We've labeled 1M Korean receipts over 8 years. Volume alone makes it hard to catch up.

2. Korean mental-health signals — 8/8 acute detection

From an 8-week measurement at a CX chatbot operator: out of 22 acute signals (suicide ideation, self-harm), global LLMs averaged 14/22. Corepin M/04 caught 22/22 with 0 false positives.

The trick is Korean indirect phrasing — "그냥 사라지고 싶다" ("I just want to disappear"), "다음 주에 멀리 가야겠어요" ("I need to go far away next week"). English-translated training data doesn't catch these.

3. HWP/HWPX parsing — 18 formats, one API

60% of Korean enterprise documents accumulated over 30 years are in Hangul (HWP, HWPX, HWPML). Microsoft doesn't support HWP. Neither does Adobe or Google. Corepin's unified parser handles HWP, HWPX, DOCX, XLSX, PPTX, PDF — 18 formats — through one API.

One manufacturer cut 30 years of approval-doc search time by 5.6×. New-hire onboarding dropped 50%. Impossible for companies relying only on global models.

4. PII masking — 89% fewer false positives

Measured at a tier-1 bank. English PII tools used to split one Korean address ("Westin Seoul Parnas B1, 524 Bongeunsa-ro") into five wrong matches. Corepin M/01 cut false positives by 89% and review time by 70%.

Korean PII has different glyph patterns, digit counts, and Hanja compounding from English. Fine-tuning an English model to Korean doesn't fix it.

5. Cost — 1/20 of LLM

Global averages from InfoWorld and Label Your Data: same workload, LLM API costs $5K–50K/month, self-hosted SLM costs $500–2K. A 5–20× gap.

Corepin per-call pricing is ₩2–10. The same task on GPT-5 costs ₩40–120. For 100K calls/month: GPT-5 is ₩4M–12M, Corepin is ₩200K–1M. Crude comparison, but with higher domain accuracy AND 5–20× lower cost.

So LLMs aren't dying

LLMs are still best at "knowing most things somewhat." Exploring new domains, creative work, hard reasoning — frontier LLMs win. We use BizRouter to route 80% to SLMs and the hard 20% to Claude or GPT.

Using both together is the answer. "LLM only" is wrong. "SLM only" is wrong. What Korean enterprises actually need is a router like BizRouter that branches automatically.

All numbers come from Corepin production data, customer measurements, and public benchmarks (CORD, OmniDocBench v1.5, K-MHaS, InfoWorld 2026). Want to run the same measurements at your company? Book a demo.

— Kyungmin Park, CTO, AI Products

한국형 SLM이 GPT-5보다 잘하는 다섯 가지. Five things Korean SLMs do better than GPT-5.