INDEX
Explanations
references to the concept of freedom and free will
New Auto-Interp
Negative Logits
pun
-0.16
py
-0.15
Jenner
-0.15
riad
-0.15
sec
-0.15
stro
-0.15
sville
-0.15
errupted
-0.14
weis
-0.14
äch
-0.14
POSITIVE LOGITS
-wheel
0.25
floating
0.22
Wheel
0.21
fall
0.21
-flow
0.21
quent
0.21
flow
0.21
lance
0.20
gan
0.20
boot
0.19
Activations Density 0.033%