INDEX
Explanations
profane language
expressions of frustration or disdain towards things perceived as nonsensical or worthless
New Auto-Interp
Negative Logits
hip
-0.92
sole
-0.75
vim
-0.75
significant
-0.73
versible
-0.73
lez
-0.69
ugal
-0.69
expression
-0.69
rez
-0.69
tein
-0.69
POSITIVE LOGITS
crap
1.05
bullshit
0.98
BS
0.98
blah
0.94
nonsense
0.93
rubbish
0.92
excuse
0.89
excuses
0.81
Jindal
0.77
ocr
0.70
Activations Density 0.006%