INDEX
    Explanations

    believes, claims, predicts, knows

    New Auto-Interp
    Negative Logits
     чого
    0.43
     phrases
    0.38
     чем
    0.37
     phr
    0.36
    0.36
     чего
    0.36
     referred
    0.36
     выбора
    0.34
     whose
    0.34
     WHAT
    0.34
    POSITIVE LOGITS
     exists
    0.58
     happened
    0.57
     Happened
    0.57
     occurs
    0.57
     existed
    0.55
     происходит
    0.54
     відбувається
    0.54
     existir
    0.53
     ocurrió
    0.53
     Happens
    0.53
    Act Density 0.011%

    No Known Activations