INDEX
Explanations
references to slot machines and gambling
New Auto-Interp
Negative Logits
isd
-0.17
emoc
-0.16
fencing
-0.15
ãĥ¼ãĤº
-0.15
snapping
-0.15
ington
-0.14
Suk
-0.14
zew
-0.14
Pills
-0.14
ãĤĩ
-0.14
POSITIVE LOGITS
progressives
0.28
machines
0.27
machine
0.24
machine
0.22
-machine
0.22
Machine
0.21
Machines
0.21
(machine
0.20
Machine
0.20
.machine
0.20
Activations Density 0.015%