INDEX
Explanations
verbs and prepositions that indicate a process or transformation
New Auto-Interp
Negative Logits
logan
-0.18
BUM
-0.16
äs
-0.16
κÎŃ
-0.16
ekim
-0.15
quip
-0.15
онов
-0.15
quan
-0.15
iri
-0.15
han
-0.14
POSITIVE LOGITS
ey
0.18
.datab
0.16
ady
0.16
eyJ
0.16
elic
0.15
elly
0.15
shot
0.15
ãĤ·ãĥ¼
0.14
its
0.14
anian
0.14
Activations Density 0.002%