INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Kal
    -0.08
    SCR
    -0.08
    Vin
    -0.08
     affirm
    -0.08
     Mund
    -0.08
     Woll
    -0.07
     Tire
    -0.07
    -0.07
    protein
    -0.07
     Vad
    -0.07
    POSITIVE LOGITS
     LB
    0.09
    しか
    0.08
    cat
    0.07
     injections
    0.07
    0.07
     Theresa
    0.07
     spirited
    0.07
     الكب
    0.07
     sir
    0.07
    cats
    0.07
    Act Density 0.006%

    No Known Activations