INDEX
Explanations
references to public opinion and democratic processes
New Auto-Interp
Negative Logits
dign
-0.17
avel
-0.14
iri
-0.14
лагод
-0.13
733
-0.13
iddi
-0.13
[assembly
-0.13
ussy
-0.13
аÑĤок
-0.13
Multiply
-0.13
POSITIVE LOGITS
support
0.39
popular
0.38
public
0.37
opinion
0.32
popular
0.31
sentiment
0.30
Support
0.29
sentiments
0.28
public
0.27
support
0.27
Activations Density 0.273%