INDEX
    Explanations

    lists of categories or items

    New Auto-Interp
    Negative Logits
     shoved
    0.44
     популя
    0.43
     устра
    0.43
     tiêu
    0.42
     کنم
    0.42
     consolid
    0.41
    embe
    0.41
    0.40
     shame
    0.40
     সরিয়ে
    0.40
    POSITIVE LOGITS
    '
    0.62
    0.59
    μι
    0.58
    rokken
    0.47
     Corbyn
    0.45
    ξη
    0.44
     nvp
    0.44
    yy
    0.44
    ństwa
    0.44
    क्क
    0.43
    Act Density 0.000%

    No Known Activations