INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Seats
    -0.09
    ɰ
    -0.08
     Islamist
    -0.08
     Quantity
    -0.07
     משה
    -0.07
    Honestly
    -0.07
     İnt
    -0.07
    เสา
    -0.07
     ecstasy
    -0.07
    素敵
    -0.07
    POSITIVE LOGITS
     pp
    0.11
     volum
    0.07
    0.07
    课堂
    0.07
    bk
    0.07
    overrides
    0.07
    .communication
    0.07
    _every
    0.07
    0.07
    processor
    0.06
    Act Density 0.003%

    No Known Activations