INDEX
Explanations
people's names, particularly ones related to significant events or controversial figures
terms related to bluntness and references to Osama bin Laden
New Auto-Interp
Negative Logits
chrom
-0.84
ales
-0.79
ois
-0.78
ault
-0.78
itsu
-0.78
vals
-0.77
ented
-0.76
illard
-0.76
uci
-0.76
uers
-0.75
POSITIVE LOGITS
lihood
0.81
_-
0.70
IST
0.67
cussion
0.66
zza
0.65
Bin
0.64
blunt
0.63
understatement
0.61
Osama
0.60
deaf
0.59
Activations Density 0.076%