INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bear
    -0.08
     hinge
    -0.07
     nelle
    -0.07
     halluc
    -0.07
     bears
    -0.07
     unfolding
    -0.07
    -0.07
     Schn
    -0.07
     dini
    -0.07
     hb
    -0.07
    POSITIVE LOGITS
    ancial
    0.10
     Tilburg
    0.08
    inel
    0.08
     Crem
    0.08
     Bark
    0.08
     Vind
    0.08
    0.07
    annelse
    0.07
     Plac
    0.07
    Yan
    0.07
    Act Density 0.001%

    No Known Activations