INDEX
    Explanations

    negative emotions

    New Auto-Interp
    Negative Logits
    ajaran
    -0.07
    -0.06
     confession
    -0.06
      
    -0.06
     Mediterranean
    -0.06
    지역
    -0.06
     그렇
    -0.06
    mekte
    -0.06
    DIV
    -0.06
    _NOTIFY
    -0.06
    POSITIVE LOGITS
     pars
    0.06
     креп
    0.06
    ar
    0.06
    غة
    0.06
    /types
    0.06
    .stat
    0.06
    erta
    0.06
     sluts
    0.06
    owego
    0.06
    anno
    0.06
    Act Density 0.035%

    No Known Activations