INDEX
Explanations
references to recent events and statements made in public forums
New Auto-Interp
Negative Logits
/-
-0.80
)/
-0.72
agogue
-0.69
oyal
-0.68
enfranch
-0.68
orno
-0.68
illusion
-0.66
hov
-0.64
ovan
-0.63
ibaba
-0.63
POSITIVE LOGITS
remarks
1.03
tweets
0.97
tweeted
0.93
stating
0.91
remark
0.89
comments
0.88
sarcast
0.87
tweeting
0.86
tweet
0.86
quotes
0.85
Activations Density 0.296%