INDEX
Explanations
titles after "the" or "named"
New Auto-Interp
Negative Logits
manly
0.56
partying
0.50
tummy
0.50
parody
0.47
macho
0.47
nipp
0.46
goodies
0.46
mommy
0.45
poodle
0.45
lube
0.45
POSITIVE LOGITS
librarians
0.54
Botan
0.54
lighthouse
0.52
Archiv
0.52
nineteen
0.52
botan
0.51
Botan
0.50
apprenticeship
0.50
Library
0.49
seventeen
0.48
Activations Density 0.043%