INDEX
Explanations
pronouns and modal verbs indicating capability or possibility
New Auto-Interp
Negative Logits
iez
-0.17
ie
-0.15
Pant
-0.14
Vers
-0.14
Att
-0.14
att
-0.14
Mon
-0.14
cano
-0.14
orny
-0.14
cou
-0.14
POSITIVE LOGITS
опол
0.16
è§£
0.15
æĪ·
0.15
antino
0.15
speculation
0.14
Fault
0.14
ogi
0.14
skyt
0.14
ška
0.14
-mf
0.14
Activations Density 0.003%