INDEX
    Explanations

    expressions of gratitude and inquiries about understanding or clarifying various topics

    New Auto-Interp
    Negative Logits
    !」
    -1.03
    ?」
    -1.02
    ?";
    -0.81
    -0.76
    ?");
    -0.75
    !';
    -0.75
    !");
    
    -0.72
    ?')
    -0.71
    !');
    -0.71
    ?”
    -0.70
    POSITIVE LOGITS
     !
    1.94
     ?
    1.71
     !)
    1.29
     ?)
    1.10
     !"
    1.10
     ؟
    1.09
     !”
    1.08
     !?
    1.04
     !!
    1.04
     !'
    1.03
    Act Density 0.305%

    No Known Activations