INDEX
    Explanations

    expressions of agreement or affirmative responses

    Beginning of agreement/acknowledgment statements

    New Auto-Interp
    Negative Logits
    ()",
    -0.64
     :",
    -0.63
     =",
    -0.61
    />";
    -0.61
    ();*/
    -0.57
     """
    
    -0.57
    :",
    -0.56
    /*",
    -0.56
    Portail
    -0.56
       
    -0.56
    POSITIVE LOGITS
    ,
    0.97
     thats
    0.80
     we
    0.76
     maybe
    0.75
     look
    0.74
     sorry
    0.73
     I
    0.73
     they
    0.72
    !
    0.69
     no
    0.68
    Act Density 0.110%

    No Known Activations