INDEX
Explanations
pronouns and verbs indicating relational dynamics or agency in context
New Auto-Interp
Negative Logits
cona
-0.18
antha
-0.17
Acres
-0.16
hos
-0.15
åĪĩãĤĬ
-0.15
ÏĨα
-0.15
iyan
-0.15
ाहर
-0.15
witch
-0.15
meis
-0.15
POSITIVE LOGITS
oke
0.17
Dyn
0.15
undo
0.15
ritten
0.14
kin
0.14
erable
0.14
istol
0.14
oc
0.14
eb
0.14
EB
0.14
Activations Density 0.001%