INDEX
    Explanations

    references to language and translation

    New Auto-Interp
    Negative Logits
    ox
    -0.16
     Ireland
    -0.15
    339
    -0.15
    ض
    -0.14
    chner
    -0.14
     intra
    -0.14
    ei
    -0.13
    pron
    -0.13
    cstdint
    -0.13
    ov
    -0.13
    POSITIVE LOGITS
     English
    0.40
    English
    0.35
     Spanish
    0.32
     Hebrew
    0.31
     languages
    0.30
     Arabic
    0.30
     english
    0.29
    èĭ±è¯Ń
    0.29
     French
    0.28
    english
    0.28
    Act Density 0.164%

    No Known Activations