INDEX
    Explanations

    negations and contradictions

    negative assertions about various subjects

    New Auto-Interp
    Negative Logits
     pione
    -0.80
    ãĥİ
    -0.75
    uador
    -0.69
    ù
    -0.68
    đ
    -0.68
    	
    -0.68
    ā
    -0.68
    Ĉ
    -0.68
    Ă
    -0.68
    ü
    -0.68
    POSITIVE LOGITS
    .
    1.65
    .]
    1.47
    !.
    1.46
    .</
    1.45
    .[
    1.44
    .","
    1.40
    !
    1.39
    .(
    1.39
    .)
    1.32
    .'
    1.31
    Act Density 1.356%

    No Known Activations