INDEX
Explanations
words related to carcinogens and substances potentially harmful to health
New Auto-Interp
Negative Logits
gency
-0.90
ikarp
-0.68
Haku
-0.68
ging
-0.67
mble
-0.67
guyen
-0.65
die
-0.64
reluct
-0.63
theless
-0.63
gie
-0.63
POSITIVE LOGITS
ogens
1.02
ational
0.91
adal
0.86
eland
0.86
otta
0.86
cano
0.85
agen
0.85
ifer
0.83
olate
0.82
tern
0.81
Activations Density 0.014%