INDEX
Explanations
the word "pose" and related string patterns that indicate potential threats or risks
New Auto-Interp
Negative Logits
?>">
-0.92
'])){
-0.84
EconPapers
-0.78
martre
-0.76
Datuak
-0.74
<=",
-0.72
'){
-0.72
")){
-0.72
)){
-0.71
'>
-0.70
POSITIVE LOGITS
pose
0.82
[:,
0.70
posed
0.69
posing
0.67
poses
0.65
وما
0.61
Idle
0.61
"+
0.60
/'+
0.59
FetchType
0.58
Activations Density 0.160%