INDEX
Explanations
drug-related terms
instances of uppercase letters or acronyms
New Auto-Interp
Negative Logits
REDACTED
-0.75
wcsstore
-0.69
oufl
-0.66
×ij
-0.64
DEFENSE
-0.63
WOR
-0.63
ACTIONS
-0.62
OPER
-0.61
FIN
-0.61
substitutes
-0.61
POSITIVE LOGITS
etch
0.74
tarian
0.73
tarians
0.71
cks
0.71
td
0.70
vable
0.68
Citiz
0.66
mx
0.65
ifter
0.65
tn
0.65
Activations Density 0.137%