INDEX
Explanations
references to toxicity and poisonous substances
New Auto-Interp
Negative Logits
BagLayout
-0.39
WriteLiteral
-0.33
Robe
-0.33
shawl
-0.32
Burt
-0.32
befe
-0.32
paille
-0.32
Torn
-0.31
Burt
-0.31
tackles
-0.30
POSITIVE LOGITS
poison
1.52
poison
1.43
Poison
1.38
poisoning
1.34
Poison
1.34
poisonous
1.33
toxic
1.31
toxicity
1.30
Toxic
1.27
toxic
1.24
Activations Density 0.570%