INDEX
Explanations
expressions conveying strong emotions or opinions
phrases that express strong opinions or hyperbolic statements
New Auto-Interp
Negative Logits
elight
-0.65
¶
-0.64
=~=~
-0.61
eworks
-0.61
=~
-0.60
office
-0.60
ESPN
-0.60
é¾įå¥ij士
-0.60
onduct
-0.60
Oswald
-0.59
POSITIVE LOGITS
ils
0.92
lot
0.88
coincidence
0.88
heck
0.81
waste
0.77
bunch
0.77
lovely
0.76
wonderful
0.75
fuss
0.74
hypocr
0.72
Activations Density 0.045%