INDEX
Explanations
variations of the pronoun "he" in different contexts
New Auto-Interp
Negative Logits
vyk
-0.15
TestCategory
-0.15
ray
-0.15
ibase
-0.15
uji
-0.14
cf
-0.14
Wel
-0.14
esc
-0.14
apult
-0.14
sm
-0.14
POSITIVE LOGITS
inner
0.18
hi
0.17
asers
0.17
öff
0.17
erb
0.16
asmus
0.16
inn
0.16
igon
0.16
immer
0.15
aser
0.15
Activations Density 0.007%