INDEX
Explanations
phrases related to the actions and behaviors of individuals or groups
phrases indicating political narratives and efforts regarding control
New Auto-Interp
Negative Logits
Else
-0.68
replies
-0.65
opy
-0.64
Pak
-0.63
++
-0.62
@#&
-0.61
spoilers
-0.60
ISP
-0.59
Tsukuyomi
-0.58
DragonMagazine
-0.57
POSITIVE LOGITS
championed
1.09
envisioned
0.98
pioneered
0.91
touted
0.91
promulg
0.88
inaug
0.86
perfected
0.85
abhor
0.84
outlined
0.83
sorely
0.82
Activations Density 0.192%