INDEX
    Explanations

    historically

    New Auto-Interp
    Negative Logits
     diffraction
    -0.08
     Su
    -0.07
     Gand
    -0.07
    Fans
    -0.07
     blows
    -0.07
     eman
    -0.07
     Bush
    -0.07
    -0.07
     Konz
    -0.07
     Eph
    -0.07
    POSITIVE LOGITS
     behaviors
    0.08
     tore
    0.07
     Sant
    0.07
     Smok
    0.07
     apro
    0.07
     micros
    0.07
     simplistic
    0.07
    endar
    0.07
     massas
    0.07
     Jack
    0.07
    Act Density 0.010%

    No Known Activations