INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    '
    0.97
     Elektr
    0.92
     an
    0.80
     Ag
    0.78
    swear
    0.77
     a
    0.75
    ן
    0.75
    ිරීම
    0.73
     An
    0.72
     echi
    0.72
    POSITIVE LOGITS
     ተመሳሳይ
    0.98
    Same
    0.92
     mêmes
    0.90
     sensibles
    0.90
    SAME
    0.79
    𝑆
    0.78
     पूछ
    0.78
    ంట
    0.77
     అదే
    0.76
     Same
    0.75
    Act Density 0.006%

    No Known Activations