INDEX
Explanations
emotional responses and expressions of concern
New Auto-Interp
Negative Logits
ADDE
-0.18
ationToken
-0.17
Hindered
-0.16
dda
-0.16
νι
-0.16
DebugEnabled
-0.15
ATAB
-0.15
Erotische
-0.14
ipse
-0.14
Neck
-0.14
POSITIVE LOGITS
liv
0.34
app
0.31
mort
0.30
pert
0.30
crest
0.27
inc
0.26
upset
0.25
chees
0.25
liv
0.23
crest
0.23
Activations Density 0.270%