INDEX
Explanations
adjectives describing characteristics or qualities
various negative or critical descriptors and themes
New Auto-Interp
Negative Logits
DonaldTrump
-0.67
iatus
-0.65
outube
-0.62
ilaterally
-0.61
itored
-0.61
;;;;
-0.61
DoS
-0.61
terday
-0.60
utics
-0.60
ecause
-0.60
POSITIVE LOGITS
iest
0.95
equivalent
0.79
liest
0.79
element
0.73
closest
0.72
portion
0.71
nearest
0.70
ultimate
0.68
itself
0.66
version
0.66
Activations Density 0.906%