INDEX
    Explanations

    instances of the word "in."

    New Auto-Interp
    Negative Logits
    istes
    -0.16
    stants
    -0.14
    uben
    -0.14
    inke
    -0.14
    /by
    -0.13
    /to
    -0.13
    iste
    -0.13
    asm
    -0.13
    illing
    -0.13
    ords
    -0.13
    POSITIVE LOGITS
    ROTO
    0.16
    imid
    0.15
     danger
    0.15
    ackage
    0.15
     essence
    0.15
    ceptors
    0.15
    eless
    0.15
     ess
    0.14
    LBL
    0.14
     league
    0.14
    Act Density 0.100%

    No Known Activations