INDEX
Explanations
references to gold or gold-related concepts
New Auto-Interp
Negative Logits
eks
-0.18
ek
-0.16
et
-0.15
ulled
-0.15
otes
-0.15
anto
-0.15
Barg
-0.15
icon
-0.15
Smoking
-0.15
aro
-0.14
POSITIVE LOGITS
stein
0.20
smith
0.19
rod
0.19
trap
0.18
rippling
0.17
piler
0.16
ambre
0.15
flen
0.15
otty
0.15
nug
0.15
Activations Density 0.030%