INDEX
Explanations
references to unsafe or unhealthy conditions and behaviors
words related to unhealthy and unsafe conditions or behavior
New Auto-Interp
Negative Logits
soType
-0.99
glas
-0.92
tsky
-0.88
ophers
-0.83
sheets
-0.83
veland
-0.82
agra
-0.81
phrase
-0.80
DragonMagazine
-0.79
vironment
-0.79
POSITIVE LOGITS
undermin
0.73
wastes
0.73
compromises
0.73
unreasonable
0.71
ities
0.70
Ukrain
0.70
duplication
0.69
disproportion
0.68
behaviour
0.67
tendencies
0.66
Activations Density 0.040%