INDEX
Explanations
emotional reactions and reflections on experiences
New Auto-Interp
Negative Logits
зв
-0.14
inker
-0.14
according
-0.14
avier
-0.14
ÅĤad
-0.14
CURRENT
-0.13
باØŃ
-0.13
searched
-0.13
we
-0.13
icking
-0.13
POSITIVE LOGITS
being
0.42
being
0.35
spending
0.30
Being
0.30
Being
0.29
被
0.27
sitting
0.26
seeing
0.26
hearing
0.25
Spending
0.25
Activations Density 0.549%