INDEX
Explanations
terms related to politics and social issues
words related to criticism and skepticism
New Auto-Interp
Negative Logits
ĸļ
-0.86
andel
-0.72
DRAG
-0.70
erity
-0.69
angan
-0.68
aurus
-0.66
peed
-0.65
ecause
-0.63
omething
-0.63
proport
-0.62
POSITIVE LOGITS
hood
1.20
ism
1.00
gery
0.93
ishly
0.93
doms
0.92
dom
0.92
ry
0.91
ship
0.91
isms
0.90
esses
0.85
Activations Density 0.232%