INDEX
    Explanations

    affirmative expressions or words indicating agreement

    New Auto-Interp
    Negative Logits
    ')")
    -0.63
    \}\\
    -0.53
    ówno
    -0.52
    "");
    -0.51
    /*",
    -0.50
    -0.50
    achy
    -0.49
     ۵
    -0.49
    </s>
    -0.48
    >");
    
    -0.48
    POSITIVE LOGITS
     Y
    2.21
     y
    1.80
     Yel
    1.45
     YE
    1.43
     YC
    1.42
     Ys
    1.40
     YP
    1.39
     YR
    1.38
     YM
    1.37
     YS
    1.36
    Act Density 0.123%

    No Known Activations