INDEX
Explanations
words associated with governmental or political dissent
New Auto-Interp
Negative Logits
HAHAHAHA
-0.74
bian
-0.68
bund
-0.67
cham
-0.63
Gingrich
-0.63
RC
-0.63
inventoryQuantity
-0.62
HAHA
-0.61
Wem
-0.60
Springfield
-0.60
POSITIVE LOGITS
ments
1.44
ificant
1.24
eous
1.00
mentation
0.96
antly
0.94
s
0.94
atures
0.94
ancy
0.92
ity
0.90
ific
0.89
Activations Density 0.008%