INDEX
    Explanations

    statements expressing opinions or beliefs

    New Auto-Interp
    Negative Logits
     themselves
    -0.18
     должно
    -0.17
     yourselves
    -0.14
    raud
    -0.14
    ubat
    -0.14
     âĨĴ↵↵
    -0.14
     Their
    -0.14
     ÑĢавно
    -0.14
    Ñĩила
    -0.14
    their
    -0.14
    POSITIVE LOGITS
     himself
    0.75
     his
    0.52
     Himself
    0.45
    his
    0.42
    ä»ĸçļĦ
    0.36
     ÙĨÙ쨳Ùĩ
    0.34
     seinem
    0.32
     zijn
    0.32
    His
    0.31
     jeho
    0.30
    Act Density 1.820%

    No Known Activations