INDEX
    Explanations

    references to subjective experiences or states

    New Auto-Interp
    Negative Logits
    thing
    -0.16
    597
    -0.14
    idious
    -0.14
    689
    -0.14
    ruba
    -0.14
    umba
    -0.13
    899
    -0.13
     Zaman
    -0.13
    ittest
    -0.13
    edList
    -0.13
    POSITIVE LOGITS
    ologically
    0.29
    ough
    0.23
    oret
    0.20
    tas
    0.20
     way
    0.19
    eway
    0.18
    ore
    0.18
     instant
    0.18
    orie
    0.17
     same
    0.16
    Act Density 0.087%

    No Known Activations