INDEX
    Explanations

    references to the concept of freedom and free will

    New Auto-Interp
    Negative Logits
    pun
    -0.16
    py
    -0.15
     Jenner
    -0.15
    riad
    -0.15
    sec
    -0.15
    stro
    -0.15
    sville
    -0.15
    errupted
    -0.14
    weis
    -0.14
    äch
    -0.14
    POSITIVE LOGITS
    -wheel
    0.25
    floating
    0.22
    Wheel
    0.21
    fall
    0.21
    -flow
    0.21
    quent
    0.21
    flow
    0.21
     lance
    0.20
    gan
    0.20
    boot
    0.19
    Act Density 0.033%

    No Known Activations