INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    мых
    -0.96
    mbal
    -0.84
     знаний
    -0.81
    Vien
    -0.81
    estrogen
    -0.81
    Interactive
    -0.81
    -0.77
     کند
    -0.74
    umburg
    -0.72
    -0.71
    POSITIVE LOGITS
     invisible
    3.75
     Invisible
    3.17
    invisible
    3.17
    Invisible
    2.84
     invis
    2.59
    visibility
    2.52
    INVISIBLE
    2.22
     visibility
    2.11
     Invis
    2.03
     cloak
    1.91
    Act Density 0.026%

    No Known Activations