INDEX
Explanations
emotional reactions and personal reflections in text
New Auto-Interp
Negative Logits
elli
-0.17
bew
-0.15
uber
-0.15
ython
-0.15
429
-0.15
è¿
-0.15
yped
-0.14
charm
-0.14
uted
-0.14
ifest
-0.14
POSITIVE LOGITS
goose
0.27
sh
0.27
hairs
0.24
bile
0.20
pause
0.20
hack
0.19
uneasy
0.18
chill
0.18
gas
0.18
involuntary
0.18
Activations Density 0.163%