INDEX
    Explanations

    phrases related to requests or offers for help

    New Auto-Interp
    Negative Logits
    ".
    
    -0.86
    ).}
    -0.85
    ).”
    -0.85
    ).
    
    -0.84
    ).'
    -0.81
    )."
    -0.80
     […]
    -0.78
    […]
    -0.77
    ).</
    -0.75
    '])->
    -0.75
    POSITIVE LOGITS
     inderdaad
    0.86
     thread
    0.83
     OP
    0.81
    <bos>
    0.72
    ...@
    0.71
    FTFY
    0.69
    ↵↵↵
    0.68
     indeed
    0.66
     downvoted
    0.66
     @
    0.66
    Act Density 0.732%

    No Known Activations