INDEX
Explanations
references to bars and drinking establishments
New Auto-Interp
Negative Logits
ska
-0.16
erness
-0.15
_reporting
-0.14
CALE
-0.14
arkin
-0.14
omic
-0.14
nal
-0.14
exus
-0.14
erring
-0.14
es
-0.14
POSITIVE LOGITS
riere
0.18
ucene
0.16
ucu
0.16
agg
0.16
ر
0.15
oque
0.15
bers
0.15
resi
0.15
AGMA
0.15
ivec
0.15
Activations Density 0.037%