INDEX
Explanations
mentions of support for a specific organization or cause
the presence of the word "that" in various contexts
New Auto-Interp
Negative Logits
Doctrine
-0.67
Corps
-0.62
Ruk
-0.62
Hallow
-0.61
Sad
-0.61
NES
-0.60
Anyway
-0.59
raz
-0.59
Planning
-0.59
riot
-0.58
POSITIVE LOGITS
arose
1.01
preceded
0.93
comprise
0.91
occur
0.89
accumulate
0.89
weren
0.88
arise
0.88
resulted
0.88
circulate
0.86
compose
0.86
Activations Density 0.191%