INDEX
Explanations
statements indicating disbelief or skepticism
expressions of strong opinions or sentiments
New Auto-Interp
Negative Logits
catentry
-1.02
cffffcc
-0.72
entimes
-0.71
ODUCT
-0.70
etermined
-0.68
Physical
-0.68
Initially
-0.64
lication
-0.63
³³³³
-0.63
TEXT
-0.63
POSITIVE LOGITS
kidding
0.96
ya
0.92
;)
0.80
idiots
0.79
bullshit
0.78
nerds
0.77
Stupid
0.76
whining
0.76
fools
0.75
ain
0.74
Activations Density 1.543%