INDEX
    Explanations

    mathematical notation and symbols

    New Auto-Interp
    Negative Logits
    ez
    -0.17
    ût
    -0.16
    ughter
    -0.16
    otech
    -0.15
     Všech
    -0.14
    yth
    -0.14
    pez
    -0.14
     vitam
    -0.14
    etz
    -0.14
    agra
    -0.14
    POSITIVE LOGITS
    cal
    0.25
    bb
    0.23
    bf
    0.23
    fr
    0.20
     cal
    0.20
    bin
    0.19
    choice
    0.18
    ring
    0.18
    inner
    0.18
    str
    0.17
    Act Density 0.035%

    No Known Activations