INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Kug
    -0.09
    sandbox
    -0.08
     JT
    -0.08
     Vaugh
    -0.08
     Greater
    -0.07
    -0.07
    gm
    -0.07
     Satan
    -0.07
    gebiet
    -0.07
    vos
    -0.07
    POSITIVE LOGITS
    eing
    0.09
     ج
    0.09
     frente
    0.08
    -made
    0.08
     calme
    0.08
     abre
    0.07
     provocative
    0.07
     abl
    0.07
    0.07
    etaan
    0.07
    Act Density 0.004%

    No Known Activations