INDEX
Explanations
references to beverages, particularly water and wine
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.05
3:0.06
4:0.07
5:0.02
6:0.10
7:0.40
8:0.02
9:0.04
10:0.08
11:0.06
Negative Logits
urtles
-1.41
derail
-1.41
tracking
-1.40
aceae
-1.39
smugglers
-1.38
erto
-1.37
obstruct
-1.36
Tycoon
-1.35
hind
-1.34
aida
-1.34
POSITIVE LOGITS
horm
1.58
goodbye
1.51
toast
1.49
champagne
1.46
龍�
1.44
mist
1.40
Goodbye
1.38
Prom
1.38
congr
1.37
Priv
1.34
Activations Density 0.002%