INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.09
     jeopard
    1.02
    ش
    1.01
    na
    0.96
    ك
    0.95
    ),
    0.94
    },
    0.93
    กับ
    0.90
     curtailed
    0.88
     legitim
    0.87
    POSITIVE LOGITS
    a
    1.32
    em
    1.24
    er
    1.21
    ى
    1.21
     zacz
    1.20
     pregunt
    1.15
     mencion
    1.13
    т
    1.13
     взя
    1.11
    erà
    1.11
    Act Density 0.014%

    No Known Activations