INDEX
    Explanations

    common phrases or expressions indicating comparisons or contrasts

    New Auto-Interp
    Negative Logits
    uj
    -0.15
     дан
    -0.15
     ÑįÑĤа
    -0.15
    ist
    -0.15
     تÙĦÙĥ
    -0.15
    adele
    -0.14
    åŃĹ
    -0.14
     ÑįÑĤÑĥ
    -0.14
    ung
    -0.13
    ungs
    -0.13
    POSITIVE LOGITS
     eso
    0.42
     cela
    0.35
     isso
    0.35
     ça
    0.35
     váºŃy
    0.32
     ello
    0.28
     THAT
    0.28
     ذÙĦÙĥ
    0.28
     Äijó
    0.25
     esto
    0.25
    Act Density 0.301%

    No Known Activations