INDEX
Explanations
phrases indicating knowledge or expertise
New Auto-Interp
Negative Logits
zimmer
-0.16
bil
-0.16
uentes
-0.15
nip
-0.15
Boh
-0.15
ibling
-0.14
foon
-0.14
ythe
-0.14
jerne
-0.14
NOTICE
-0.14
POSITIVE LOGITS
ledged
0.23
ingly
0.23
ledge
0.22
ledge
0.20
-how
0.19
ings
0.19
edges
0.19
lesi
0.19
ance
0.19
les
0.19
Activations Density 0.013%