INDEX
Explanations
sentiments or opinions related to politics, government actions, and competitivenes
New Auto-Interp
Negative Logits
©¶æ¥µ
-0.76
Ĥª
-0.74
yssey
-0.67
Ratings
-0.66
vironments
-0.66
udos
-0.60
ŃĶ
-0.56
almost
-0.55
images
-0.55
emen
-0.54
POSITIVE LOGITS
or
1.37
nor
1.23
OR
0.80
versus
0.73
Or
0.70
Or
0.68
vs
0.62
merits
0.62
yourself
0.59
outright
0.59
Activations Density 0.244%