INDEX
Explanations
names related to specific regions or individuals
proper nouns, particularly names of places and characters within a narrative
New Auto-Interp
Negative Logits
hetical
-0.78
chens
-0.71
ded
-0.70
20439
-0.69
chy
-0.68
ynt
-0.67
amins
-0.66
FML
-0.66
arians
-0.66
pert
-0.65
POSITIVE LOGITS
Gw
1.12
ipeg
0.73
Savannah
0.72
icago
0.71
Kw
0.69
board
0.68
shield
0.68
aston
0.67
luster
0.67
Chattanooga
0.67
Activations Density 0.007%