INDEX
Explanations
phrases related to safety and security measures
mentions of precautionary or preventive actions
New Auto-Interp
Negative Logits
aples
-0.81
ille
-0.68
thia
-0.67
illes
-0.67
lishing
-0.66
iled
-0.65
Origins
-0.65
export
-0.64
raid
-0.64
Safari
-0.63
POSITIVE LOGITS
measures
1.12
measure
1.04
Measures
0.96
measures
0.91
terday
0.90
agos
0.89
autions
0.86
iblings
0.78
mith
0.75
asures
0.75
Activations Density 0.010%