INDEX
Explanations
phrases indicating emotional or relational dynamics
New Auto-Interp
Negative Logits
κει
-0.17
lesh
-0.16
ylie
-0.15
Sır
-0.15
riba
-0.15
SSL
-0.15
orm
-0.15
veau
-0.14
Ulus
-0.14
ampo
-0.14
POSITIVE LOGITS
falls
0.22
falling
0.21
Falling
0.21
Falls
0.21
fell
0.21
asleep
0.19
Fallen
0.19
trap
0.19
fall
0.18
falls
0.18
Activations Density 0.029%