INDEX
Explanations
physical harm or danger related words and situations
New Auto-Interp
Negative Logits
omorph
-0.26
ioxide
-0.24
ramid
-0.23
oter
-0.23
aminer
-0.22
alogy
-0.22
arta
-0.21
etheus
-0.21
othermal
-0.21
anyon
-0.21
POSITIVE LOGITS
enance
0.27
iate
0.27
soever
0.25
Compat
0.25
adoes
0.25
yet
0.24
theless
0.24
withstanding
0.23
whatsoever
0.23
iless
0.23
Activations Density 15.257%