INDEX
Explanations
controversial or divisive topics and terms related to social and political issues
references to current political issues and controversies
New Auto-Interp
Negative Logits
ãĥ´
-0.61
ä
-0.59
ãĤ·ãĥ£
-0.55
é¾įå
-0.55
Seas
-0.54
cellaneous
-0.52
ŃĶ
-0.51
named
-0.51
YN
-0.51
Hur
-0.49
POSITIVE LOGITS
anymore
0.78
anytime
0.73
someday
0.72
sooner
0.69
legitimately
0.66
matically
0.64
instead
0.64
differently
0.64
ASAP
0.64
*****
0.60
Activations Density 1.193%