INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    WARD
    -0.73
    href
    -0.72
     plur
    -0.71
    atives
    -0.70
    ebin
    -0.69
     Gamergate
    -0.69
    NER
    -0.69
    ership
    -0.69
    kus
    -0.69
    Filename
    -0.69
    POSITIVE LOGITS
     pudding
    1.03
     cake
    0.96
    anut
    0.93
     flavored
    0.92
     coated
    0.89
     chip
    0.89
     chocolate
    0.87
     syrup
    0.87
     flav
    0.86
     butter
    0.85
    Act Density 0.016%

    No Known Activations