INDEX
    Explanations

    the word "pose" and related string patterns that indicate potential threats or risks

    New Auto-Interp
    Negative Logits
    ?>">
    -0.92
    '])){
    
    -0.84
     EconPapers
    -0.78
    martre
    -0.76
    Datuak
    -0.74
     <=",
    -0.72
    '){
    
    -0.72
    ")){
    
    -0.72
    )){
    
    -0.71
    '>
    
    -0.70
    POSITIVE LOGITS
     pose
    0.82
    [:,
    0.70
     posed
    0.69
     posing
    0.67
     poses
    0.65
    وما
    0.61
    Idle
    0.61
     "+
    0.60
    /'+
    0.59
     FetchType
    0.58
    Act Density 0.160%

    No Known Activations