INDEX
    Explanations

    explaining what things are or do

    New Auto-Interp
    Negative Logits
    urities
    0.71
    wet
    0.70
     শরত
    0.64
    вла
    0.63
     ошиб
    0.62
     unquestion
    0.62
    reth
    0.61
     группу
    0.59
    ceding
    0.59
    obb
    0.59
    POSITIVE LOGITS
     doing
    2.23
     Doing
    2.09
    Doing
    1.93
    doing
    1.80
     do
    1.55
    做什么
    1.41
     accomplishing
    1.40
     lakukan
    1.34
     doet
    1.31
     accomplishes
    1.28
    Act Density 0.799%

    No Known Activations