INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Courage
    -0.65
     Ballard
    -0.61
    rained
    -0.59
     proportion
    -0.59
    RO
    -0.59
     sadd
    -0.58
     BART
    -0.58
     expulsion
    -0.58
    oted
    -0.57
     Rend
    -0.56
    POSITIVE LOGITS
    www
    0.99
    natureconservancy
    0.93
    youtube
    0.87
    xual
    0.84
    forums
    0.84
    ibaba
    0.84
    ctl
    0.83
    ertodd
    0.79
    :/
    0.78
    ebin
    0.77
    Act Density 0.007%

    No Known Activations