INDEX
Explanations
personal pronouns and verbs related to one's own actions
references to a specific male individual or pronoun
New Auto-Interp
Negative Logits
htaking
-0.69
earch
-0.69
endif
-0.67
anking
-0.66
endment
-0.65
history
-0.63
aphael
-0.62
ifference
-0.61
maxwell
-0.61
Bundes
-0.61
POSITIVE LOGITS
'd
1.05
undertook
0.94
eded
0.94
redes
0.94
uristic
0.92
aped
0.91
wrote
0.90
considers
0.90
swore
0.89
dreamed
0.88
Activations Density 0.106%