INDEX
Explanations
words related to different types of poisons or poisoning incidents
references to various types of poison or poisoning incidents
New Auto-Interp
Negative Logits
noon
-0.89
pora
-0.74
dar
-0.72
————
-0.69
tall
-0.68
aan
-0.66
peed
-0.65
Scouting
-0.63
sit
-0.63
mega
-0.63
POSITIVE LOGITS
poisoning
1.09
poisoned
0.97
iv
0.92
poison
0.91
fumes
0.91
Ivy
0.88
arsenic
0.88
poisonous
0.87
gas
0.86
darts
0.84
Activations Density 0.026%