INDEX
Explanations
proper nouns
the word "with" in various contexts
New Auto-Interp
Negative Logits
¥µ
-0.76
DEV
-0.69
rodents
-0.66
aepernick
-0.65
expression
-0.65
surg
-0.64
depress
-0.63
berman
-0.62
gging
-0.62
lucky
-0.62
POSITIVE LOGITS
ith
1.37
otle
1.06
iths
1.00
ium
0.97
ieth
0.92
yll
0.92
iop
0.87
ACA
0.82
ofer
0.81
ITH
0.80
Activations Density 0.009%