INDEX
Explanations
statements related to policies, politics, and organizations
New Auto-Interp
Negative Logits
CCC
-0.79
Eleven
-0.77
JD
-0.70
paragraph
-0.66
mire
-0.66
cgi
-0.62
Daniels
-0.61
Reply
-0.60
Posts
-0.60
Rating
-0.60
POSITIVE LOGITS
're
1.37
selves
1.17
selves
1.08
themselves
0.99
'll
0.99
've
0.96
zbollah
0.96
atically
0.92
respective
0.90
are
0.88
Activations Density 2.048%