INDEX
    Explanations

    numerical representations or classifications

    New Auto-Interp
    Negative Logits
    v
    -0.23
    l
    -0.23
    auf
    -0.23
    lut
    -0.21
    ré
    -0.21
    r
    -0.21
    aqu
    -0.20
    o
    -0.20
    i
    -0.20
    lk
    -0.19
    POSITIVE LOGITS
    obra
    0.20
    ensored
    0.20
    usp
    0.19
    ursive
    0.19
    ove
    0.19
    actus
    0.19
    rosso
    0.19
    zech
    0.19
    ypress
    0.19
    oven
    0.19
    Act Density 0.019%

    No Known Activations