INDEX
    Explanations

    words that contain the substring "ar"

    New Auto-Interp
    Negative Logits
    ec
    -0.25
    y
    -0.24
    ene
    -0.24
    enden
    -0.23
    ent
    -0.23
    etti
    -0.23
    ey
    -0.23
    ek
    -0.21
    gers
    -0.21
    ela
    -0.21
    POSITIVE LOGITS
    beiten
    0.25
    thur
    0.24
    oon
    0.23
    beiter
    0.23
    monic
    0.22
    riors
    0.21
    hyth
    0.21
    ctic
    0.20
    ctica
    0.20
    aptor
    0.20
    Act Density 0.112%

    No Known Activations