INDEX
Explanations
statements containing contentious or controversial topics and events
New Auto-Interp
Negative Logits
urst
-0.74
isers
-0.71
zer
-0.71
iser
-0.65
furt
-0.65
izer
-0.65
à¼
-0.64
quickShipAvailable
-0.63
izers
-0.63
ãĥİ
-0.63
POSITIVE LOGITS
itia
0.74
assimil
0.74
prosecute
0.74
reciproc
0.72
Macy
0.71
vacc
0.70
hin
0.70
sooner
0.69
mention
0.69
adequately
0.68
Activations Density 0.118%