INDEX
Explanations
high-frequency function words and connectors that structure sentences
New Auto-Interp
Negative Logits
ãĥ³ãĤ¸
-0.17
phies
-0.16
ucz
-0.15
argas
-0.15
ulet
-0.15
fetisch
-0.14
loi
-0.14
libs
-0.14
_COMPAT
-0.14
UGC
-0.14
POSITIVE LOGITS
oup
0.17
quint
0.15
Y
0.15
Spe
0.14
ela
0.14
Tun
0.14
tun
0.14
0.13
ift
0.13
Att
0.13
Activations Density 0.001%