INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    out
    0.45
    on
    0.44
     elated
    0.43
     имеется
    0.41
    resident
    0.40
    كلة
    0.40
    外出
    0.39
    W
    0.39
    felt
    0.38
    in
    0.38
    POSITIVE LOGITS
     숫자
    0.52
     Theft
    0.49
    ApiParam
    0.46
     authoritarian
    0.46
     angka
    0.45
     tikai
    0.45
     Wasser
    0.45
     sayıda
    0.45
     ainult
    0.44
    鸡蛋
    0.44
    Act Density 0.018%

    No Known Activations