INDEX
Explanations
words related to political activities and events
statements related to political events and decisions
New Auto-Interp
Negative Logits
âĵĺ
-0.81
surprisingly
-0.71
ãĤ´ãĥ³
-0.63
+.
-0.63
instead
-0.59
!.
-0.58
"#
-0.58
Duo
-0.58
unless
-0.58
anwhile
-0.58
POSITIVE LOGITS
..."
1.56
)"
1.42
â̦"
1.40
)."
1.35
)",
1.24
[
1.24
),"
1.18
)</
1.17
..."
1.14
['
1.13
Activations Density 1.085%