INDEX
Explanations
references to various types of alcoholic beverages
New Auto-Interp
Negative Logits
ipt
-0.16
ارات
-0.15
dist
-0.15
repr
-0.14
coc
-0.14
pek
-0.14
sun
-0.14
_chance
-0.14
ÑįÑĦ
-0.14
Works
-0.13
POSITIVE LOGITS
white
0.28
whites
0.28
ries
0.26
Gew
0.24
Pin
0.24
Mos
0.24
ros
0.24
white
0.24
Cab
0.23
wines
0.23
Activations Density 0.043%