INDEX
Explanations
mentions of the importance or impact of events or actions
phrases indicating positive sentiment or enjoyment
New Auto-Interp
Negative Logits
oulos
-0.59
Adin
-0.54
bestos
-0.54
Kaplan
-0.50
Madd
-0.49
Sutton
-0.49
meanwhile
-0.49
postwar
-0.48
Whitman
-0.48
moreover
-0.48
POSITIVE LOGITS
alot
0.89
;)
0.73
haha
0.69
:)
0.67
crap
0.67
shit
0.66
dont
0.65
:(
0.64
pic
0.64
gonna
0.63
Activations Density 2.133%