INDEX
    Explanations

    concepts related to classification and existence

    New Auto-Interp
    Negative Logits
    дан
    -0.18
    ÑĪли
    -0.16
    bond
    -0.15
     باشÛĮد
    -0.15
     sollten
    -0.15
    amage
    -0.14
    ophobia
    -0.14
    utto
    -0.14
    илиÑģÑĮ
    -0.14
    strup
    -0.14
    POSITIVE LOGITS
    uje
    0.27
    ÑĭваеÑĤ
    0.23
    uelve
    0.22
    аеÑĤ
    0.21
    ÑģÑĤвÑĥеÑĤ
    0.21
    ίζει
    0.21
    ÑĥÑĶ
    0.20
    ζει
    0.19
    ulates
    0.18
    иваеÑĤ
    0.18
    Act Density 0.070%

    No Known Activations