INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    gur
    -0.74
    mble
    -0.72
    eous
    -0.71
    iosyncr
    -0.69
    ffiti
    -0.69
     Sachs
    -0.67
    indal
    -0.67
    undo
    -0.67
     Protestant
    -0.64
     Fellowship
    -0.63
    POSITIVE LOGITS
    aclysm
    1.36
    heter
    1.18
    fish
    1.08
    chers
    0.98
    cat
    0.97
    cats
    0.93
    alogue
    0.88
     kittens
    0.88
     paws
    0.86
    cher
    0.84
    Act Density 0.015%

    No Known Activations