INDEX
Explanations
third-person pronouns referring to male individuals
New Auto-Interp
Negative Logits
FUL
-0.17
ÑĤÑı
-0.14
azer
-0.14
ãĥ³ãĤ¬
-0.14
Consort
-0.14
_READONLY
-0.13
ful
-0.13
urdu
-0.13
enso
-0.13
:"",↵
-0.13
POSITIVE LOGITS
or
0.25
/her
0.22
/she
0.17
auer
0.17
enberg
0.17
/h
0.15
idi
0.15
idian
0.15
/
0.15
jes
0.15
Activations Density 0.031%