INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    aeper
    -0.94
    Ħ¢
    -0.87
    achine
    -0.75
    eg
    -0.69
    esity
    -0.68
    ecause
    -0.66
    eer
    -0.66
    PDATE
    -0.64
    eers
    -0.64
    ucket
    -0.64
    POSITIVE LOGITS
    Fed
    0.93
    books
    0.85
     whence
    0.85
    forge
    0.83
    book
    0.81
    alogy
    0.78
    finding
    0.76
    Forge
    0.72
    aneous
    0.71
     Sources
    0.70
    Act Density 2.077%

    No Known Activations