INDEX
Explanations
mentions of alcoholic beverages and their types
New Auto-Interp
Negative Logits
dist
-0.18
ارات
-0.15
_chance
-0.14
uç
-0.14
ény
-0.14
464
-0.14
andr
-0.13
raman
-0.13
voy
-0.13
enta
-0.13
POSITIVE LOGITS
white
0.26
Cab
0.26
whites
0.26
Mos
0.24
ries
0.23
Sau
0.23
cab
0.23
mos
0.22
white
0.22
Gew
0.22
Activations Density 0.046%