INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     to
    1.06
     on
    1.05
     in
    1.02
     mówi
    0.91
     воды
    0.90
    dan
    0.84
     Üniversitesi
    0.84
     were
    0.83
     реки
    0.83
     ludzie
    0.83
    POSITIVE LOGITS
    ر
    0.79
    不但
    0.79
    ες
    0.78
    ਕਰ
    0.77
    0.77
    ad
    0.77
    ਾਵ
    0.75
    н
    0.75
    0.74
    ας
    0.74
    Act Density 0.000%

    No Known Activations