INDEX
Explanations
references to self-identity
New Auto-Interp
Negative Logits
er
-0.84
palen
-0.67
erent
-0.66
位の
-0.65
Den
-0.65
TRI
-0.64
ER
-0.64
PreAuthorize
-0.63
o
-0.61
czeko
-0.59
POSITIVE LOGITS
myself
2.20
yourself
2.11
myself
2.09
herself
2.02
ourselves
1.97
himself
1.96
Yourself
1.92
Myself
1.87
herself
1.85
Himself
1.84
Activations Density 0.058%