INDEX
Explanations
mentions of news sources and reported information
sentences that report information or statements from various sources
New Auto-Interp
Negative Logits
utter
-0.69
ourselves
-0.66
landscape
-0.65
à¼
-0.64
¬¼
-0.63
elephant
-0.62
pudding
-0.60
vict
-0.60
transistor
-0.60
deserved
-0.60
POSITIVE LOGITS
Notably
1.03
Additionally
1.00
Investigators
0.95
However
0.93
Specifically
0.90
Meanwhile
0.89
Another
0.89
Similarly
0.86
Presumably
0.86
Along
0.86
Activations Density 0.279%