INDEX
Explanations
URL links
URLs or web links in the text
New Auto-Interp
Negative Logits
bills
-0.68
Plants
-0.65
apes
-0.65
memos
-0.64
Monkey
-0.63
ļéĨĴ
-0.62
Pyramid
-0.60
Throne
-0.60
Wasserman
-0.60
Doodle
-0.60
POSITIVE LOGITS
hm
0.98
gallery
0.94
brow
0.88
jon
0.85
handle
0.84
kj
0.81
hash
0.81
expl
0.81
j
0.80
jac
0.79
Activations Density 0.014%