INDEX
    Explanations

    the word "whom" in the text

    New Auto-Interp
    Negative Logits
    pad
    -0.77
    forcing
    -0.73
    DEN
    -0.72
    rid
    -0.68
    jad
    -0.67
    fix
    -0.66
    case
    -0.65
    lag
    -0.64
    tight
    -0.64
    termin
    -0.64
    POSITIVE LOGITS
    soever
    1.95
    selves
    0.86
     dearly
    0.79
     whom
    0.75
    onga
    0.70
     alike
    0.66
     coh
    0.63
     shares
    0.63
    usalem
    0.62
     vou
    0.62
    Act Density 0.007%

    No Known Activations