INDEX
Explanations
derogatory or dismissive language related to opinions or reviews
New Auto-Interp
Negative Logits
nakalista
-0.53
gnore
-0.53
rån
-0.52
findpost
-0.51
kasarigan
-0.50
::~
-0.48
Responder
-0.48
TagMode
-0.48
nahilalakip
-0.47
Autowired
-0.47
POSITIVE LOGITS
nonsense
1.14
onsense
0.95
nonsense
0.95
bullshit
0.91
shenanigans
0.76
crap
0.76
Nonsense
0.72
foolishness
0.71
fuss
0.68
stuff
0.67
Activations Density 0.225%