INDEX
Explanations
the pronoun "he" in various contexts
New Auto-Interp
Negative Logits
ยะ
-0.15
ANTS
-0.14
-0.14
omba
-0.14
rella
-0.14
-0.14
Interactive
-0.13
omik
-0.13
relude
-0.13
ritt
-0.13
POSITIVE LOGITS
/her
0.23
/she
0.20
panic
0.15
GBK
0.15
Ä±ÅŁÄ±k
0.15
env
0.15
aler
0.14
unma
0.14
inerary
0.14
idelberg
0.14
Activations Density 0.241%