INDEX
Explanations
names or references related to a person named "He."
proper nouns, particularly names of characters or significant figures
New Auto-Interp
Negative Logits
elig
-0.77
lished
-0.75
ultras
-0.64
Ultr
-0.63
horr
-0.63
bluff
-0.61
ADRA
-0.60
nightmares
-0.57
clusions
-0.57
Rivals
-0.57
POSITIVE LOGITS
igans
0.93
felt
0.87
kil
0.73
idan
0.73
frames
0.71
kens
0.71
chenko
0.69
ides
0.69
bal
0.67
kamp
0.67
Activations Density 0.083%