INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    elingen
    0.41
     حافظ
    0.39
     anche
    0.39
     (&
    0.39
    قق
    0.39
     eigentlich
    0.38
    мови
    0.37
    ÇÕES
    0.37
     adjusts
    0.37
     formes
    0.37
    POSITIVE LOGITS
    𝓻
    0.43
    โร
    0.42
    ដើម្បី
    0.38
    0.37
     negó
    0.37
    shirt
    0.36
     Bengali
    0.36
    uro
    0.35
     लोग
    0.35
    โม
    0.35
    Act Density 0.086%

    No Known Activations