INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Folks
    -0.09
     Gest
    -0.08
     punct
    -0.08
     Boxer
    -0.08
     rid
    -0.08
     GST
    -0.08
     Wed
    -0.08
     Peoples
    -0.07
     Prote
    -0.07
     Vaughan
    -0.07
    POSITIVE LOGITS
     vigor
    0.08
    _-
    0.07
    street
    0.07
    Dependent
    0.07
    isar
    0.07
    0.07
     lenta
    0.07
    RM
    0.07
    llvm
    0.07
    Atom
    0.07
    Act Density 0.007%

    No Known Activations