INDEX
Explanations
references to potential dangers or threats, specifically related to traps and explosives
references to the concept of "boby" and related terms in various contexts
New Auto-Interp
Negative Logits
ioned
-0.83
heses
-0.82
isco
-0.76
ivity
-0.73
acular
-0.69
oat
-0.69
runner
-0.69
acial
-0.68
erer
-0.68
ports
-0.67
POSITIVE LOGITS
pta
0.92
ãĥį
0.91
ãĥĥãĥĪ
0.87
ë
0.78
DoS
0.78
AGES
0.77
æ©Ł
0.71
============
0.70
ãĥ³ãĤ¸
0.70
ppo
0.69
Activations Density 0.046%