INDEX
Explanations
connections between verbs and their subjects or modifiers
New Auto-Interp
Negative Logits
pii
-0.17
ABC
-0.16
623
-0.16
ouden
-0.15
ĵåIJį
-0.15
å³°
-0.15
roud
-0.15
oute
-0.14
olum
-0.14
ABC
-0.14
POSITIVE LOGITS
ughs
0.17
ernen
0.16
izik
0.15
áÄį
0.15
plorer
0.15
cci
0.15
enis
0.15
engkap
0.14
Plug
0.14
angu
0.14
Activations Density 0.001%