INDEX
Explanations
references to photographs and images
New Auto-Interp
Negative Logits
man
-0.20
iams
-0.19
leo
-0.19
land
-0.18
ted
-0.18
lin
-0.17
most
-0.17
wn
-0.17
erna
-0.17
ward
-0.17
POSITIVE LOGITS
electric
0.29
journal
0.29
hoot
0.28
ynthesis
0.26
volta
0.22
essay
0.21
album
0.21
wipe
0.21
chemical
0.20
ops
0.20
Activations Density 0.022%