INDEX
Explanations
phrases conveying strong emotions, including hatred and frustration
expressions of emotional conflict and tension
New Auto-Interp
Negative Logits
)—
-0.63
—
-0.60
uscript
-0.60
guiActiveUn
-0.60
ÂŃ
-0.57
Associated
-0.56
"—
-0.55
untarily
-0.55
emaker
-0.54
Newsletter
-0.53
POSITIVE LOGITS
etc
1.16
haha
0.95
huh
0.85
etc
0.82
albeit
0.78
yeah
0.78
dunno
0.75
eh
0.73
blah
0.73
aka
0.73
Activations Density 0.588%