INDEX
Explanations
words related to danger and risk
instances of the term "dangerous."
New Auto-Interp
Negative Logits
via
-0.85
arist
-0.81
ļéĨĴ
-0.79
mination
-0.77
roma
-0.76
olitan
-0.76
anish
-0.76
á
-0.75
elf
-0.75
angular
-0.74
POSITIVE LOGITS
undermin
0.92
adolesc
0.90
dangerous
0.88
endanger
0.88
danger
0.81
danger
0.80
sounding
0.77
dangers
0.76
mischief
0.76
Danger
0.74
Activations Density 0.019%