INDEX
Explanations
news sources and social media platforms
New Auto-Interp
Negative Logits
ãĤ¢ãĥ«
-0.70
erness
-0.63
unspecified
-0.59
autions
-0.58
foul
-0.58
agonist
-0.56
secut
-0.55
circumstance
-0.55
ciplinary
-0.55
istrate
-0.54
POSITIVE LOGITS
etc
1.09
or
0.93
®,
0.78
).
0.70
and
0.67
).
0.66
(),
0.65
,
0.63
ramids
0.62
,
0.60
Activations Density 0.576%