INDEX
    Explanations

    the letter 's' at the end of words

    the word "is" and its forms in various contexts

    New Auto-Interp
    Negative Logits
     override
    -0.66
     outweigh
    -0.65
     Prior
    -0.64
    alys
    -0.63
     Reef
    -0.61
     takedown
    -0.60
     evaluations
    -0.60
     scares
    -0.59
    Prior
    -0.59
     Hebdo
    -0.58
    POSITIVE LOGITS
    outhern
    0.87
    pecially
    0.83
    been
    0.82
    forth
    0.77
    enegger
    0.76
    igi
    0.75
    inki
    0.75
    leeve
    0.74
    por
    0.74
    lightly
    0.73
    Act Density 0.109%

    No Known Activations