INDEX
Explanations
specific names or terms associated with individuals or locations
proper nouns referring to individuals or specific entities
New Auto-Interp
Negative Logits
dens
-0.67
recomb
-0.67
dawn
-0.64
crim
-0.59
-0.58
converge
-0.56
conserv
-0.56
ymm
-0.54
Prof
-0.54
Migration
-0.54
POSITIVE LOGITS
's
0.73
himself
0.73
orst
0.72
tsy
0.72
hyde
0.70
ouf
0.70
kowski
0.70
accuser
0.69
neau
0.68
kson
0.67
Activations Density 0.271%