INDEX
Explanations
phrases indicating emotional states or actions related to support and care
New Auto-Interp
Negative Logits
urre
-0.15
Siz
-0.15
_pres
-0.15
麻
-0.14
nnen
-0.14
862
-0.14
ifo
-0.14
pray
-0.13
igious
-0.13
852
-0.13
POSITIVE LOGITS
next
0.22
Next
0.21
circ
0.21
Falls
0.20
next
0.19
Fell
0.18
NEXT
0.18
fall
0.18
èIJ½
0.17
fell
0.17
Activations Density 0.044%