INDEX
Explanations
references to beverages, specifically alcoholic drinks
New Auto-Interp
Negative Logits
optera
-0.15
kiye
-0.15
bred
-0.15
osaic
-0.14
isch
-0.14
azz
-0.14
-Sah
-0.14
ież
-0.13
زار
-0.13
izable
-0.13
POSITIVE LOGITS
nested
0.17
utan
0.16
iloc
0.15
idel
0.14
rah
0.14
spiel
0.14
uç
0.14
ê·¸ëŀĺ
0.13
ypo
0.13
erman
0.13
Activations Density 0.013%