INDEX
Explanations
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
å¹¹
-0.16
ixo
-0.15
Sher
-0.15
itung
-0.14
orro
-0.14
ãĤĦãģĻ
-0.14
ost
-0.14
885
-0.14
stry
-0.14
istra
-0.14
POSITIVE LOGITS
ίÏīν
0.15
phis
0.15
errat
0.15
رÙĪØ²
0.15
osci
0.15
álo
0.15
biên
0.14
森
0.14
woff
0.14
nton
0.14
Activations Density 0.004%