INDEX
    Explanations

    occurrences of the word "to" in various contexts

    New Auto-Interp
    Negative Logits
     outl
    -0.78
    ividual
    -0.74
    mble
    -0.70
     averages
    -0.65
    raising
    -0.64
     appropri
    -0.63
     oun
    -0.63
     vein
    -0.63
     working
    -0.63
    ancies
    -0.61
    POSITIVE LOGITS
    ilet
    1.11
    jo
    1.04
    ppo
    1.03
    fen
    1.00
    pper
    0.93
    ppa
    0.93
    pping
    0.93
    fore
    0.92
    ffee
    0.91
    ven
    0.91
    Act Density 0.007%

    No Known Activations