INDEX
Explanations
token identification and usage
New Auto-Interp
Negative Logits
ure
1.19
ts
1.18
t
1.10
y
1.03
taining
1.02
,
1.02
gt
1.00
ty
0.98
ti
0.94
in
0.92
POSITIVE LOGITS
Token
1.20
Token
1.16
令牌
1.11
getToken
1.08
م
1.06
ד
1.01
Tokens
1.00
TOKEN
0.99
esperado
0.95
من
0.93
Activations Density 0.024%