INDEX
Explanations
references to harmful substances or situations
references to toxic substances or environments
New Auto-Interp
Negative Logits
FORE
-0.74
ploma
-0.69
Untitled
-0.69
UTERS
-0.69
LEASE
-0.68
zzi
-0.68
365
-0.68
bler
-0.67
HCR
-0.67
BO
-0.66
POSITIVE LOGITS
masculinity
1.03
ologist
1.00
ological
0.99
fumes
0.97
ology
0.95
poisoning
0.94
oxic
0.94
waste
0.93
substances
0.92
ologically
0.91
Activations Density 0.014%