강점, 한계, 그리고 Recall Wall

Hyena 가 빛나는 곳

Hyena 는 genomics 워크로드처럼 보이는 워크로드에서 best: 매우 긴 sequence, 자연어보다 더 regular 한 패턴, 정확한 retrieval 의 일부 손실에 대한 tolerance. 구체적으로:

Genomic 모델링 — DNA, RNA, protein sequence. Evo 2 가 canonical example. HyenaDNA 가 predecessor.
Byte-level 모델링 — byte 에서 직접 작동 (tokenizer 없음), sequence 가 tokenized 형태보다 4× 긴 곳.
Time-series forecasting 강한 periodic 구조 가진 데이터에서, convolutional inductive bias 가 도와줘.
Long-context 문서 처리 task 가 retrieval 아닌 statistical 패턴 capture 인 곳.

Hyena 가 struggle 하는 곳

표준 scale 의 자연어 modeling 에서, Hyena 는 competitive 지만 dominate 안 함. 360M scale 에서, Hyena perplexity 10.11 vs attention 의 8.39 — 가깝지만 진짜 gap. 더 큰 scale 에서 gap 줄지만 완전 안 닫고, Mamba-family architecture 가 일반적으로 language task 에서 앞서.

가장 큰 실용 한계는 recall. recall-intensive 벤치마크에서, Hyena 가 약 5.1 점, attention 의 47.7 대비 — 즉, Hyena 가 특정 과거 토큰 retrieve 에 대략 order of magnitude 더 약해. 같은 task family 에서 Mamba 보다도 더 약해, Hyena 의 convolution-based mixing 이 selectivity 동등물 없으니까. Filter 가 content 아닌 position 에서 생성.

학습 fragility

Hyena 는 눈에 띄게 learning-rate sensitive — Mamba 보다도 좁은 stable LR window. Implicit-filter FFN 이 initialization 에 sensitive 하고, 발표된 recipe 를 신중히 따르지 않으면 convergence 가 brittle. Research lab 한테 tax; production team 한테 barrier. 이게 Hyena 가 StripedHyena line 너머 더 broad 한 language-modeling 채택 못 본 일부 이유.

Exercise

recall-intensive 벤치마크 골라 (LongBench QA task 또는 synthetic key-value retrieval set) StripedHyena-7B checkpoint 돌려. Llama-2 7B baseline 대비 score 비교. 그 다음 같은 checkpoint 를 long-context summarization task (예: GovReport) 에서 돌려. Relative score 가 flip 할 거야 — Hyena 가 QA 에서 지고, summarization 에서 종종 이기거나 비김. 그게 recall-vs-summarization axis, 숫자로.

Hyena 가 빛나는 곳

Hyena 가 struggle 하는 곳

학습 fragility

External links

Exercise

Progress

댓글 0