INDEX
Explanations
phrases related to being under the influence of something, often alcohol
New Auto-Interp
Negative Logits
ern
-0.72
boxing
-0.67
supra
-0.66
erness
-0.66
hops
-0.65
aries
-0.65
phabet
-0.65
aza
-0.64
ahead
-0.64
pring
-0.63
POSITIVE LOGITS
guise
1.26
supervision
1.21
ausp
1.18
influence
1.15
microscope
1.09
radar
1.09
umbrella
1.03
guidance
1.03
pretext
0.97
hood
0.96
Activations Density 0.051%