INDEX
Explanations
instances where the word "he" is preceded by different characters
occurrences of the pronoun "he."
New Auto-Interp
Negative Logits
hips
-0.94
eleph
-0.68
Bundes
-0.63
ãĥ¯
-0.63
rador
-0.62
Bull
-0.60
GEAR
-0.60
domestically
-0.60
ertodd
-0.59
yarn
-0.58
POSITIVE LOGITS
isure
1.11
ather
0.99
lling
0.96
ALTH
0.95
rette
0.94
ller
0.93
aton
0.91
ugh
0.90
dule
0.89
itage
0.89
Activations Density 0.067%