Advancements in Anthropic's LLM Number Handling

Max Buckley, a software engineer at Google and AI researcher shares his findings on how Anthropic's model Claude handles numbers differently from its competitors.

Integer tokenization has evolved significantly from the earlier methods used by models like GPT-2 and GPT-3, which relied on Byte Pair Encoding (BPE) and resulted in inconsistent and inefficient handling of numbers. Recent advancements have seen a shift towards more efficient strategies, such as tokenizing individual digits or sequences of up to three digits.

A notable innovation is the adoption of Right to Left (R2L) tokenization, as seen in Anthropic's Claude 3 models. This method, unlike the traditional Left to Right (L2R) approach, tokenizes numbers from the least significant to the most significant digit. Research has shown that R2L tokenization significantly improves the performance of models on arithmetic tasks.

Key Insights:
- Historical Context: Older models like GPT-2 and GPT-3 used BPE, leading to arbitrary tokenization of numbers and forcing models to memorize arithmetic rules inconsistently.
- Tokenizing individual digits (used in smaller models like Mistral, Llama 1 & 2).
- Tokenizing up to three digits (adopted by newer models like GPT-3.5, GPT-4, Claude 3).
- R2L Tokenization Benefits: Demonstrated improved performance on arithmetic tasks, with models like Claude 3 showing superior results compared to competitors.
- Experimental Evidence: Testing showed significant improvements in arithmetic accuracy for models using R2L tokenization, particularly for large numbers.

Anthropic’s Implementation:
- Claude 3 models use R2L tokenization.
- Significant improvement in arithmetic tasks.
- Claude 3's smallest model outperforms flagship models from other providers.

Categories : Machine Learning

Press Ask Flow below to get a link to the resource

Ask Flow