INDEX
Explanations
proper nouns related to politics and personalities
mentions of specific individuals or entities linked to a narrative
New Auto-Interp
Negative Logits
ocal
-0.81
utical
-0.79
ises
-0.77
icate
-0.74
UAL
-0.73
ual
-0.73
icals
-0.72
iary
-0.72
Mehran
-0.71
ically
-0.70
POSITIVE LOGITS
bench
0.77
noon
0.75
pole
0.74
Stack
0.74
tons
0.73
tail
0.72
butt
0.72
mie
0.72
bra
0.71
ben
0.71
Activations Density 0.028%