INDEX
    Explanations

    a specific word related to medical conditions, spelled in different variations

    instances of the substring 'ir'

    New Auto-Interp
    Negative Logits
     untreated
    -0.67
    Ĥª
    -0.65
     Wolver
    -0.63
    er
    -0.61
    ĨĴ
    -0.59
    legates
    -0.59
    ļé
    -0.58
     warrant
    -0.58
    alter
    -0.57
     clue
    -0.57
    POSITIVE LOGITS
    vana
    1.33
    rha
    1.14
    andom
    1.06
    mingham
    0.99
    oux
    0.97
    ror
    0.94
    acial
    0.91
    cles
    0.91
    ROR
    0.89
    abbit
    0.89
    Act Density 0.028%

    No Known Activations