INDEX
Explanations
expressions of emotional experiences and personal reflections
New Auto-Interp
Negative Logits
ÑĭÑĤ
-0.15
ai
-0.14
/us
-0.14
Kem
-0.13
ief
-0.13
Panc
-0.13
iesel
-0.13
Mey
-0.13
Britann
-0.13
ouz
-0.13
POSITIVE LOGITS
us
0.26
you
0.22
me
0.21
him
0.20
them
0.17
sé
0.17
bạn
0.17
annya
0.16
vám
0.15
GOODMAN
0.15
Activations Density 0.492%