INDEX
Explanations
reflexive pronouns in various contexts
New Auto-Interp
Negative Logits
icip
-0.17
ož
-0.15
cole
-0.15
ecute
-0.14
oes
-0.14
antu
-0.14
ừ
-0.14
_topology
-0.13
icipation
-0.13
erle
-0.13
POSITIVE LOGITS
uls
0.22
conde
0.20
ared
0.19
ules
0.19
ãĥ³ãĥĩãĤ£
0.16
ign
0.16
content
0.16
Hutchinson
0.16
vez
0.15
enth
0.15
Activations Density 0.004%