INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     endl
    0.42
    Ax
    0.40
    IU
    0.38
     AT
    0.38
    ilingual
    0.38
     mathvariant
    0.37
    Pixel
    0.36
    Language
    0.36
    0.36
    ONTO
    0.36
    POSITIVE LOGITS
     fra
    0.46
     فرا
    0.45
    ニャ
    0.45
    /\/
    0.41
    fra
    0.40
    bottle
    0.40
    0.40
    რც
    0.40
     SSA
    0.39
    ন্যা
    0.39
    Act Density 0.002%

    No Known Activations