INDEX
    Explanations

    personal pronouns standing alone

    New Auto-Interp
    Negative Logits
    theless
    -0.78
    rules
    -0.69
    rooms
    -0.67
    imentary
    -0.66
    taboola
    -0.62
     cov
    -0.61
     combustion
    -0.59
    ded
    -0.59
    ynamic
    -0.58
    spawn
    -0.58
    POSITIVE LOGITS
    OUS
    1.11
    AMI
    1.09
    YA
    1.08
    KE
    1.04
    BILITY
    1.04
    ALLY
    0.99
    RECT
    0.99
    BA
    0.98
    ANS
    0.97
    WI
    0.97
    Act Density 0.031%

    No Known Activations