INDEX
Explanations
terms related to safety and hazards in consumer products
New Auto-Interp
Negative Logits
ihan
-0.16
ibble
-0.16
à¥įवर
-0.16
fst
-0.15
ünd
-0.15
污
-0.15
ãĤ«ãĥ¼
-0.15
Fraction
-0.14
ÑĤÑĮ
-0.14
annonces
-0.14
POSITIVE LOGITS
safety
0.46
Safety
0.41
Safety
0.38
afety
0.35
fire
0.34
unsafe
0.33
Fire
0.31
Unsafe
0.30
safer
0.28
safe
0.27
Activations Density 0.150%