INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ża
    -0.08
     vandal
    -0.08
     tattoo
    -0.08
     profanity
    -0.08
    nhof
    -0.08
    isering
    -0.08
    审批
    -0.08
    afari
    -0.08
     hometown
    -0.08
     Fus
    -0.08
    POSITIVE LOGITS
    _Mode
    0.09
    entropy
    0.08
    enthal
    0.08
     electron
    0.08
     electrons
    0.08
     therm
    0.08
     entropy
    0.08
    _mode
    0.08
     темпера
    0.08
     Mode
    0.08
    Act Density 0.002%

    No Known Activations