INDEX
    Explanations

    negative contractions and forms of denial

    New Auto-Interp
    Negative Logits
    envolvimento
    -0.36
    ])]
    -0.35
     vů
    -0.32
    章节错误
    -0.31
    )',
    -0.31
    s
    -0.31
    })`
    -0.30
    )]$
    -0.30
    )\
    -0.29
    )])
    -0.29
    POSITIVE LOGITS
    SharedDtor
    0.75
     müſſen
    0.74
     للاسماء
    0.72
     ویکی‌پدی
    0.70
     Roskov
    0.70
     queſta
    0.69
    niſſe
    0.68
    <unused52>
    0.66
    [@BOS@]
    0.65
    <unused3>
    0.65
    Act Density 0.000%

    No Known Activations