INDEX
    Explanations

    negative assertions regarding capability or understanding

    New Auto-Interp
    Negative Logits
    agle
    -0.17
    tos
    -0.16
    ISTA
    -0.15
    iente
    -0.15
     sh
    -0.15
    igan
    -0.14
    weg
    -0.14
     ago
    -0.14
    uit
    -0.14
    610
    -0.14
    POSITIVE LOGITS
    istrovstvÃŃ
    0.19
    -lfs
    0.18
     yoksa
    0.18
    riad
    0.15
    ç·Ĵ
    0.15
    kân
    0.15
    ani
    0.15
    raquo
    0.14
    zcze
    0.14
    arb
    0.14
    Act Density 0.057%

    No Known Activations