INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    🤬
    0.40
    ポケ
    0.38
    maktadır
    0.38
    வும்
    0.37
     Worse
    0.36
     schlim
    0.35
     peor
    0.35
    0.34
    😐
    0.34
     worse
    0.34
    POSITIVE LOGITS
     optional
    4.44
    optional
    4.16
     Optional
    4.06
    Optional
    4.06
    可选
    3.17
     optionally
    3.16
     Optionally
    2.78
    Optionals
    2.39
    OPT
    2.25
     वैकल्पिक
    2.08
    Act Density 0.135%

    No Known Activations