INDEX
Explanations
the word "up" in various contexts, indicating a focus on upward movement or positivity
New Auto-Interp
Negative Logits
t
-0.23
ings
-0.18
ildren
-0.16
undler
-0.16
tres
-0.16
awai
-0.16
amik
-0.16
ureau
-0.15
ectl
-0.15
esson
-0.15
POSITIVE LOGITS
root
0.33
holding
0.33
ping
0.32
sets
0.31
state
0.30
river
0.30
ped
0.29
turned
0.28
ended
0.28
front
0.28
Activations Density 0.046%