INDEX
Explanations
instances where a particular person or political figure is mentioned
references to identity in various contexts
New Auto-Interp
Negative Logits
zzo
-0.88
avorite
-0.76
âĨij
-0.71
Ħ¢
-0.71
Lank
-0.69
azel
-0.69
scill
-0.68
untreated
-0.68
¶ħ
-0.66
OTT
-0.66
POSITIVE LOGITS
ity
1.15
ident
1.12
ities
1.10
ifiers
1.05
ially
0.97
itarian
0.95
ifier
0.91
ifying
0.90
iary
0.89
ified
0.88
Activations Density 0.014%