INDEX
Explanations
phrases related to expressing opinions or uttering words
phrases that indicate speech or communication
New Auto-Interp
Negative Logits
shoot
-0.75
pleting
-0.69
onut
-0.67
equipped
-0.64
gart
-0.63
amate
-0.62
pend
-0.62
xtap
-0.61
Destroyer
-0.61
typh
-0.59
POSITIVE LOGITS
goodbye
1.31
aloud
1.22
about
1.00
loudly
0.91
afterwards
0.89
uttered
0.83
beforehand
0.83
afterward
0.82
publicly
0.81
sarcast
0.80
Activations Density 0.073%