INDEX
Explanations
topics or instances related to controversies
mentions of controversy
New Auto-Interp
Negative Logits
urances
-0.75
onz
-0.66
lasses
-0.66
ingers
-0.65
avement
-0.64
urance
-0.63
endar
-0.63
intest
-0.62
doors
-0.62
ells
-0.62
POSITIVE LOGITS
controversy
0.94
naire
0.90
controversies
0.89
revolving
0.85
uproar
0.84
flared
0.81
involving
0.81
erupted
0.80
raged
0.78
arises
0.78
Activations Density 0.019%