INDEX
Explanations
the presence of the word "lo" or its variations, indicating themes of loss or lack
New Auto-Interp
Negative Logits
ldr
-0.16
an
-0.16
569
-0.14
wid
-0.14
Serious
-0.14
ieve
-0.14
becue
-0.14
Mud
-0.14
ende
-0.14
èĮ
-0.14
POSITIVE LOGITS
oting
0.26
cket
0.24
oser
0.23
vel
0.22
iter
0.21
opy
0.21
oters
0.21
athed
0.21
aves
0.21
oney
0.20
Activations Density 0.003%