INDEX
Explanations
controversial topics or events
the term "controversy" and its related contexts
New Auto-Interp
Negative Logits
odes
-0.80
berman
-0.75
ingers
-0.74
lasses
-0.73
vae
-0.72
ells
-0.68
onz
-0.67
thening
-0.66
amina
-0.66
ourke
-0.65
POSITIVE LOGITS
controversy
1.04
controversies
0.95
naire
0.94
shroud
0.83
naires
0.81
uproar
0.80
involving
0.80
flared
0.78
revolving
0.78
arises
0.77
Activations Density 0.013%