INDEX
Explanations
phrases related to telling someone to stop doing something
phrases indicating emotional or confrontational expressions
New Auto-Interp
Negative Logits
Cosponsors
-0.62
cially
-0.58
etheless
-0.57
sequently
-0.55
ocument
-0.52
uably
-0.51
APH
-0.50
uilt
-0.49
ocally
-0.49
orously
-0.48
POSITIVE LOGITS
fucking
0.76
fuckin
0.75
shit
0.69
goddamn
0.68
daddy
0.65
crap
0.64
!",
0.62
damn
0.61
freaking
0.60
godd
0.59
Activations Density 1.744%