INDEX
    Explanations

    words related to criticism or judgment

    phrases that express different types or kinds of things

    New Auto-Interp
    Negative Logits
    dn
    -0.89
    gor
    -0.79
    ĸļ
    -0.77
    UF
    -0.77
    idelines
    -0.76
    alf
    -0.75
    в
    -0.75
    ï¸
    -0.74
    none
    -0.74
    APS
    -0.73
    POSITIVE LOGITS
     thing
    1.56
     stuff
    1.12
     behavior
    1.05
     crap
    1.01
     mischief
    0.98
     behaviour
    0.98
     shenanigans
    0.97
     situation
    0.96
     mentality
    0.95
     activity
    0.93
    Act Density 0.053%

    No Known Activations