INDEX
Explanations
references to different types of bottles
references to bottles
New Auto-Interp
Negative Logits
merce
-0.89
doms
-0.79
urities
-0.72
yrinth
-0.71
ansion
-0.68
IFA
-0.66
uli
-0.66
ominated
-0.65
tale
-0.65
entity
-0.65
POSITIVE LOGITS
bottles
1.23
bottle
1.18
Bottle
1.01
Bott
0.93
Bott
0.91
opener
0.87
vodka
0.84
labelled
0.84
refill
0.83
mark
0.81
Activations Density 0.026%