INDEX
    Explanations

    references to personal or emotional introspection

    New Auto-Interp
    Negative Logits
    ade
    -0.16
     Castro
    -0.15
    itra
    -0.15
    duk
    -0.15
    .tick
    -0.14
    [
    -0.14
    igen
    -0.14
    abile
    -0.14
    emploi
    -0.14
    ais
    -0.14
    POSITIVE LOGITS
     DISCLAIM
    0.16
    ħn
    0.15
    εÏį
    0.15
    bcm
    0.15
    ģn
    0.15
    änger
    0.14
    arra
    0.14
    curities
    0.14
    ymoon
    0.14
    ģm
    0.14
    Act Density 0.050%

    No Known Activations