INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Satu
    0.40
    0.37
     erbjud
    0.37
    トム
    0.37
     patching
    0.37
    アニメ
    0.36
    0.36
    atu
    0.36
     HEART
    0.36
    岐阜
    0.36
    POSITIVE LOGITS
    antly
    0.39
     знаешь
    0.38
     twists
    0.38
    })]
    0.38
    oramic
    0.37
    ardy
    0.37
    волю
    0.36
     منف
    0.36
    Cox
    0.35
     peril
    0.35
    Act Density 0.000%

    No Known Activations