INDEX
Explanations
references to casino games and online gambling
New Auto-Interp
Negative Logits
Gad
-0.17
Dup
-0.15
ouser
-0.15
esy
-0.15
Canter
-0.14
dur
-0.14
ylland
-0.14
Irma
-0.13
Gamb
-0.13
Goods
-0.13
POSITIVE LOGITS
abin
0.15
td
0.15
jde
0.14
nova
0.14
utt
0.14
/xhtml
0.14
utton
0.14
öy
0.14
εÏĦ
0.14
erno
0.13
Activations Density 0.022%