INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     فقط
    -0.07
    _san
    -0.07
     нему
    -0.06
     M
    -0.06
    пеки
    -0.06
    adora
    -0.06
     ));
    -0.06
    frac
    -0.06
    าษฎร
    -0.06
    ièrement
    -0.06
    POSITIVE LOGITS
     novel
    0.30
     novels
    0.19
     Novel
    0.18
     Assembly
    0.07
     nave
    0.06
    479
    0.06
     psychologist
    0.06
    小说
    0.06
     novelty
    0.06
     ром
    0.06
    Act Density 0.008%

    No Known Activations