INDEX
Explanations
numeric values and dates relevant to factual events or statistics
New Auto-Interp
Negative Logits
ending
-0.16
embros
-0.15
edor
-0.15
enger
-0.14
vais
-0.14
inding
-0.14
olis
-0.14
essel
-0.14
inho
-0.14
ubit
-0.14
POSITIVE LOGITS
ADVERTISEMENT
0.16
orget
0.14
witter
0.14
tweet
0.14
tweet
0.14
twe
0.14
Tweet
0.14
WindowText
0.14
ADS
0.14
<\/
0.13
Activations Density 0.007%