INDEX
Explanations
instances of the pronoun "he" in various forms
New Auto-Interp
Negative Logits
dad
-0.17
anza
-0.15
epam
-0.14
atories
-0.14
gne
-0.14
alling
-0.14
ategy
-0.14
mand
-0.14
angement
-0.14
_rng
-0.13
POSITIVE LOGITS
uddy
0.16
owler
0.16
ritten
0.15
HTTPRequest
0.14
Shade
0.14
ź
0.14
CONN
0.14
ấu
0.14
Blog
0.14
ugins
0.13
Activations Density 0.020%