INDEX
    Explanations

    prepositions and phrases indicating action or direction

    New Auto-Interp
    Negative Logits
    ify
    -0.19
    nt
    -0.18
    nip
    -0.18
    t
    -0.18
    pedia
    -0.18
    nap
    -0.17
    ingly
    -0.17
    oretical
    -0.16
    rin
    -0.15
    rl
    -0.15
    POSITIVE LOGITS
    wner
    0.26
    asters
    0.22
    ffee
    0.21
    ledo
    0.21
    asty
    0.21
    ppers
    0.20
    pline
    0.20
    eh
    0.19
    asts
    0.19
    /from
    0.19
    Act Density 0.090%

    No Known Activations