INDEX
Explanations
verbs indicating someone speaking or providing information
instances of quotation marks or reported speech
New Auto-Interp
Negative Logits
tumblr
-0.77
LET
-0.72
arest
-0.70
illions
-0.70
Holy
-0.69
isable
-0.69
respective
-0.68
pe
-0.66
Pages
-0.66
TABLE
-0.65
POSITIVE LOGITS
bluntly
0.83
sarcast
0.82
anecd
0.81
afterward
0.78
goodbye
0.72
heit
0.70
emphatically
0.70
essler
0.69
KR
0.69
doms
0.68
Activations Density 0.175%