INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    axis
    -0.08
     qt
    -0.07
    gratis
    -0.07
    139
    -0.06
    SCRIPT
    -0.06
     McCoy
    -0.06
    Basically
    -0.06
     fields
    -0.06
    ARB
    -0.06
     Colombian
    -0.06
    POSITIVE LOGITS
     Bere
    0.07
     achieving
    0.07
     listItem
    0.07
     turning
    0.07
    .env
    0.06
     Oral
    0.06
     Instructions
    0.06
     Brick
    0.06
    .Drawing
    0.06
     brute
    0.06
    Act Density 0.002%

    No Known Activations