INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     historian
    -0.07
     згад
    -0.07
    、「
    -0.07
     Hitler
    -0.07
    ublik
    -0.06
     Homer
    -0.06
     tabel
    -0.06
     тай
    -0.06
     علی
    -0.06
     Malik
    -0.06
    POSITIVE LOGITS
    _dicts
    0.06
    ewise
    0.06
    ीन
    0.06
     appeared
    0.06
     ödeme
    0.06
    ΐ
    0.06
    _attachment
    0.06
    PRI
    0.06
    ภาพยนตร
    0.06
    0.06
    Act Density 0.008%

    No Known Activations