INDEX
    Explanations

    words related to categories or classifications

    New Auto-Interp
    Negative Logits
    uptools
    -0.19
    te
    -0.18
    raud
    -0.16
    tti
    -0.16
    ngen
    -0.16
    traits
    -0.16
    nard
    -0.16
    tec
    -0.16
    lli
    -0.15
    sla
    -0.15
    POSITIVE LOGITS
    cly
    0.23
    ción
    0.23
    re
    0.22
    h
    0.21
    rella
    0.19
    hom
    0.18
    ways
    0.18
    c
    0.18
    cube
    0.17
    fal
    0.17
    Act Density 0.020%

    No Known Activations