INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Eld
    -0.07
     войны
    -0.06
     Elections
    -0.06
     Jed
    -0.06
    =@"
    -0.06
     respectful
    -0.06
    .Min
    -0.06
    Hen
    -0.06
     cigaret
    -0.06
     Johannesburg
    -0.06
    POSITIVE LOGITS
    bakan
    0.07
    0.07
    ้าส
    0.06
     multid
    0.06
     obsession
    0.06
     giới
    0.06
    0.06
     obsessed
    0.06
    0.06
     chứa
    0.06
    Act Density 0.072%

    No Known Activations