INDEX
Explanations
pronouns and their relationships to actions and states
New Auto-Interp
Negative Logits
stanov
-0.15
ih
-0.15
orer
-0.14
mine
-0.14
aro
-0.14
Bras
-0.14
Trent
-0.14
-pages
-0.14
Long
-0.14
pages
-0.14
POSITIVE LOGITS
Ink
0.15
quin
0.14
unresolved
0.14
/../
0.14
ìĹĦ
0.14
WI
0.14
moms
0.13
pol
0.13
unlike
0.13
Ñģли
0.13
Activations Density 0.177%