INDEX
Explanations
words related to potentially negative situations or outcomes
words indicating potential outcomes or risks
New Auto-Interp
Negative Logits
baugh
-0.85
bern
-0.76
ger
-0.76
core
-0.75
fare
-0.71
vation
-0.71
woods
-0.70
ablishment
-0.69
otle
-0.69
ĸļ
-0.69
POSITIVE LOGITS
jeopard
0.96
hazardous
0.95
lethal
0.86
synerg
0.84
contam
0.82
habitable
0.81
damaging
0.79
harmful
0.79
disrupt
0.78
dangerous
0.77
Activations Density 0.028%