INDEX
Explanations
repeated first-person possessive pronouns
New Auto-Interp
Negative Logits
feroit
-0.40
kobieta
-0.35
escla
-0.35
poveznice
-0.34
provoquer
-0.34
kvinnor
-0.33
vább
-0.33
näm
-0.33
surgió
-0.32
planche
-0.32
POSITIVE LOGITS
own
1.30
自己的
0.84
their
0.79
his
0.79
Their
0.77
ihrer
0.77
자신의
0.76
Own
0.75
Own
0.75
Their
0.75
Activations Density 0.649%