INDEX
Explanations
words related to social issues and activism
New Auto-Interp
Negative Logits
RAW
-0.84
DCS
-0.70
SPONSORED
-0.70
agonist
-0.68
Flavoring
-0.67
Lovecraft
-0.67
Defense
-0.66
puff
-0.66
warming
-0.65
clinton
-0.64
POSITIVE LOGITS
alle
0.98
qui
0.92
mi
0.91
si
0.88
que
0.87
este
0.86
est
0.85
beit
0.85
cknowled
0.85
ja
0.81
Activations Density 0.146%