INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     XX
    -0.07
    .body
    -0.06
    اتف
    -0.06
     x
    -0.06
     Aurora
    -0.06
    ème
    -0.06
     mould
    -0.06
     Arist
    -0.06
    (res
    -0.06
    +r
    -0.05
    POSITIVE LOGITS
     무슨
    0.08
    0.07
    0.07
    ませ
    0.07
    MSN
    0.07
    ितन
    0.07
    rebbe
    0.06
    0.06
    0.06
    лер
    0.06
    Act Density 0.003%

    No Known Activations