INDEX
Explanations
phrases indicating a sense of loss or abandonment
New Auto-Interp
Negative Logits
ostel
-0.15
rices
-0.15
avra
-0.14
arin
-0.14
lav
-0.14
sik
-0.14
Stay
-0.14
emey
-0.14
_MISSING
-0.14
reo
-0.13
POSITIVE LOGITS
feeling
0.23
wonder
0.22
stranded
0.21
wondering
0.21
with
0.20
without
0.18
speech
0.18
vulnerable
0.18
Wonder
0.17
Speech
0.17
Activations Density 0.027%