INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    你自己
    -0.82
     兼
    -0.75
     причем
    -0.75
    anat
    -0.75
     интерес
    -0.74
    нула
    -0.72
    Naturally
    -0.71
    AndEndTag
    -0.71
    selling
    -0.71
     plut
    -0.71
    POSITIVE LOGITS
     regretted
    1.23
     regrets
    1.23
     glad
    1.10
     regret
    1.09
    امل
    1.05
     ahora
    1.02
    Turns
    0.99
     arrep
    0.95
     because
    0.93
     ended
    0.93
    Act Density 0.032%

    No Known Activations