INDEX
    Explanations

    the letters "th" at the beginning of words

    New Auto-Interp
    Negative Logits
    ech
    -0.17
    antha
    -0.17
    ease
    -0.15
    ishly
    -0.15
    imum
    -0.15
    kinson
    -0.15
    igans
    -0.14
     па
    -0.14
    edback
    -0.14
    hid
    -0.14
    POSITIVE LOGITS
    ematic
    0.22
     Th
    0.21
    ales
    0.20
    ailand
    0.20
     th
    0.19
    omas
    0.19
    ALES
    0.19
    rought
    0.18
    ematics
    0.18
     rough
    0.18
    Act Density 0.034%

    No Known Activations