INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     flaw
    0.58
     target
    0.58
     pie
    0.57
     appealing
    0.57
     stap
    0.56
     soothing
    0.56
     enticing
    0.56
     gl
    0.56
     oblivion
    0.56
     targeted
    0.56
    POSITIVE LOGITS
    2
    1.20
    1
    1.09
    ۲۰
    1.04
    疫情
    0.91
    Covid
    0.90
    新冠
    0.90
     коронави
    0.89
     ۲۰
    0.89
    <unused1857>
    0.89
     covid
    0.89
    Act Density 0.107%

    No Known Activations