INDEX
Explanations
high-frequency function words and prepositions often found in logical or structured arguments
New Auto-Interp
Negative Logits
ëĿ½
-0.16
imson
-0.16
enchmark
-0.15
Werner
-0.15
791
-0.14
cub
-0.14
870
-0.14
cont
-0.14
uchs
-0.14
478
-0.14
POSITIVE LOGITS
ippi
0.18
ipple
0.16
ez
0.16
arc
0.15
rin
0.15
uelles
0.15
geb
0.14
arbon
0.14
oki
0.14
izio
0.14
Activations Density 0.001%