INDEX
Explanations
references to different types of poison
terms related to poison and toxic substances
New Auto-Interp
Negative Logits
noon
-0.87
irc
-0.76
blance
-0.75
ann
-0.74
planned
-0.70
dar
-0.68
stand
-0.67
den
-0.67
AMA
-0.66
arel
-0.65
POSITIVE LOGITS
poisoning
1.22
poison
1.15
poisoned
1.08
dart
1.02
poisonous
0.98
arsenic
0.87
darts
0.87
poisons
0.86
antidote
0.85
gas
0.85
Activations Density 0.015%