INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    as
    0.51
     δο
    0.47
    0.47
    0.47
    occasion
    0.47
     လက်
    0.46
    at
    0.46
     Σε
    0.45
    μος
    0.45
    ्रिक
    0.45
    POSITIVE LOGITS
     náz
    0.45
     были
    0.45
    1
    0.44
     were
    0.44
     সাত
    0.43
     justifications
    0.42
     worlds
    0.42
     SRL
    0.41
    ttes
    0.41
     wastes
    0.40
    Act Density 0.000%

    No Known Activations