INDEX
Explanations
phrases related to controversial political topics
New Auto-Interp
Negative Logits
iba
-0.81
uffle
-0.79
aten
-0.76
asha
-0.66
gra
-0.64
nel
-0.64
suicides
-0.63
aters
-0.62
indu
-0.62
ami
-0.62
POSITIVE LOGITS
opportunity
1.05
chance
1.01
thumbs
0.95
choice
0.91
insight
0.89
berth
0.88
assurance
0.87
permission
0.87
assurances
0.83
latitude
0.81
Activations Density 2.290%