INDEX
Explanations
adjectives conveying danger or risk
references to dangerous situations or conditions
New Auto-Interp
Negative Logits
gdala
-0.82
olitan
-0.82
ļéĨĴ
-0.80
mination
-0.79
glas
-0.78
hew
-0.78
via
-0.78
roma
-0.77
pel
-0.76
rix
-0.76
POSITIVE LOGITS
adolesc
0.96
endanger
0.86
undermin
0.86
dangerous
0.80
danger
0.78
dangers
0.77
consequence
0.76
sounding
0.76
threats
0.75
overdose
0.74
Activations Density 0.028%