INDEX
Explanations
phrases expressing disappointment or sadness
instances of the word "that" indicating expressions of regret or disappointment
New Auto-Interp
Negative Logits
orah
-0.69
ead
-0.60
MH
-0.60
ocaust
-0.60
istance
-0.59
thur
-0.58
zman
-0.58
utable
-0.58
ione
-0.57
apor
-0.57
POSITIVE LOGITS
soever
0.76
surrounds
0.75
cher
0.73
ndra
0.67
fy
0.65
76561
0.64
terday
0.63
ovie
0.63
they
0.63
pesky
0.61
Activations Density 0.268%