INDEX
    Explanations

    words related to negative actions or consequences

    New Auto-Interp
    Negative Logits
    /umd
    -0.18
    esser
    -0.16
    GuidId
    -0.15
     kino
    -0.14
    alsy
    -0.14
     Souls
    -0.14
    ÅĽcie
    -0.14
    assis
    -0.14
    vala
    -0.14
    èľĺèĽĽ
    -0.14
    POSITIVE LOGITS
     Club
    0.23
    Club
    0.20
     Frozen
    0.19
     Penguin
    0.19
     Snow
    0.18
     cp
    0.18
     frozen
    0.18
     sled
    0.18
    Snow
    0.18
     club
    0.17
    Act Density 0.001%

    No Known Activations