INDEX
Explanations
names or words related to specific individuals
references to specific individuals or entities in a political context
New Auto-Interp
Negative Logits
sburgh
-0.83
sers
-0.75
sburg
-0.74
IVERS
-0.73
afia
-0.71
namese
-0.69
behavi
-0.68
spirited
-0.67
ersion
-0.67
ainment
-0.66
POSITIVE LOGITS
arella
0.89
eye
0.77
hao
0.75
aman
0.75
phrine
0.75
agame
0.74
orah
0.74
ipped
0.73
eta
0.72
arks
0.72
Activations Density 0.024%