INDEX
Explanations
expressions of frustration or annoyance, such as phew, ugh, oh, and sighs
New Auto-Interp
Negative Logits
iers
-0.42
icts
-0.41
iership
-0.39
jri
-0.39
":[{"-0.39
ifles
-0.39
jug
-0.38
sanctioned
-0.38
forming
-0.38
ensis
-0.38
POSITIVE LOGITS
HHHH
0.55
hhhh
0.54
hhh
0.52
hh
0.49
goodbye
0.44
Clockwork
0.43
athe
0.43
awk
0.42
ouls
0.41
yeah
0.41
Activations Density 5.800%