INDEX
Explanations
references to the usage of substances or practices often related to health or protocols
New Auto-Interp
Negative Logits
en
-0.91
o
-0.89
er
-0.81
es
-0.71
fer
-0.71
F
-0.69
Scha
-0.68
Y
-0.67
y
-0.67
u
-0.66
POSITIVE LOGITS
Usage
1.23
usage
1.09
usage
1.09
Usage
1.05
USAGE
1.02
usages
0.97
USAGE
0.95
usages
0.88
^(@)
0.88
BibitemShut
0.85
Activations Density 0.005%