INDEX
Explanations
personalized messages asking for approval or support
direct addresses or references to the reader
New Auto-Interp
Negative Logits
ipal
-0.84
asio
-0.70
inct
-0.67
Cook
-0.67
ape
-0.67
apes
-0.67
Verge
-0.66
Dispatch
-0.66
acular
-0.66
otom
-0.65
POSITIVE LOGITS
guys
1.24
're
1.17
tub
1.07
've
0.92
'll
0.89
sir
0.89
RS
0.87
know
0.85
filthy
0.84
gentlemen
0.80
Activations Density 0.255%