INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    en
    3.61
    er
    3.29
    et
    3.07
    am
    2.90
    ooo
    2.78
    oq
    2.74
    oooo
    2.54
    2.51
    2.48
    гда
    2.42
    POSITIVE LOGITS
    2.92
    𝑔
    2.68
    tobago
    2.64
    t
    2.52
    2.46
    Р
    2.37
    unfortunately
    2.31
    tattoo
    2.28
    ंना
    2.26
     forhold
    2.20
    Act Density 0.006%

    No Known Activations