INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     EOS
    -0.09
     pn
    -0.08
     SG
    -0.07
     vd
    -0.07
     eb
    -0.07
     Finn
    -0.07
     Indicates
    -0.07
     stead
    -0.07
     Jog
    -0.07
    -0.07
    POSITIVE LOGITS
    Bart
    0.09
     Petr
    0.08
    ricula
    0.07
    ă
    0.07
    mnt
    0.07
    sed
    0.07
     Barber
    0.07
     savings
    0.07
    0.07
     Cabinet
    0.07
    Act Density 0.004%

    No Known Activations