INDEX
    Explanations

    instances of words that may indicate emotional states or experiences

    New Auto-Interp
    Negative Logits
    orum
    -0.18
    inda
    -0.16
    heimer
    -0.14
     pop
    -0.14
     
    -0.14
    yme
    -0.14
    303
    -0.14
    iba
    -0.14
     backward
    -0.14
     litter
    -0.13
    POSITIVE LOGITS
    ĵåIJį
    0.18
    meli
    0.17
    achsen
    0.16
    ovÃŃ
    0.15
    ekte
    0.15
    ander
    0.15
    asset
    0.15
    omik
    0.14
    etrics
    0.14
    _PA
    0.14
    Act Density 0.002%

    No Known Activations