INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     PLL
    -0.08
     LL
    -0.08
     graduated
    -0.07
    ’y
    -0.07
     Zahl
    -0.06
    limits
    -0.06
     Murphy
    -0.06
     Thompson
    -0.06
     Robertson
    -0.06
     Dickinson
    -0.06
    POSITIVE LOGITS
     fake
    0.15
     Fake
    0.11
    Fake
    0.10
    fake
    0.10
    fak
    0.09
    ek
    0.07
    _fake
    0.07
     bogus
    0.07
     faux
    0.07
     Fak
    0.07
    Act Density 0.004%

    No Known Activations