INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Newark
    -0.09
     citt
    -0.09
    irs
    -0.08
     pumpkin
    -0.07
     Ehren
    -0.07
     Stol
    -0.07
     Müll
    -0.07
     Umb
    -0.07
     Thanksgiving
    -0.07
    iceps
    -0.07
    POSITIVE LOGITS
     suav
    0.08
    0.08
    _SECTION
    0.08
     Roche
    0.08
     politely
    0.07
     Pu
    0.07
     precedent
    0.07
     rekening
    0.07
    0.07
    Fen
    0.07
    Act Density 0.006%

    No Known Activations