INDEX
    Explanations

    source of advice or criticism

    New Auto-Interp
    Negative Logits
    RestorePolicy
    0.38
    ffee
    0.38
     حالی
    0.37
     davon
    0.37
    ૂર
    0.37
     પૈ
    0.36
    சிற
    0.36
     गौरतलब
    0.35
    0.35
    nitř
    0.34
    POSITIVE LOGITS
     from
    1.12
    from
    1.12
    จาก
    1.09
    来自
    1.03
     từ
    1.02
    來自
    1.02
     från
    1.00
     kutoka
    0.96
     от
    0.89
     από
    0.88
    Act Density 0.039%

    No Known Activations