INDEX
Explanations
mentions that emphasize actions or events
references to organizations, agencies, and systemic structures involved in governance or societal issues
New Auto-Interp
Negative Logits
etheless
-0.87
ãĤ´ãĥ³
-0.83
looph
-0.69
"#
-0.69
arnaev
-0.65
Amid
-0.65
Alas
-0.62
§§
-0.62
Amid
-0.62
Yet
-0.62
POSITIVE LOGITS
really
1.09
[
1.08
kind
1.08
hasn
1.07
basically
1.07
doesn
1.05
...
1.01
understands
0.99
wants
0.98
everybody
0.98
Activations Density 0.426%