INDEX
Explanations
words associated with names or titles
New Auto-Interp
Negative Logits
warts
-0.16
assis
-0.16
154
-0.16
rus
-0.15
duro
-0.15
sáng
-0.15
assen
-0.14
acer
-0.14
w
-0.14
urses
-0.14
POSITIVE LOGITS
pike
0.21
mallow
0.18
bles
0.17
borough
0.17
.Mar
0.16
juana
0.16
============================================================================↵
0.16
Swinger
0.16
mar
0.16
Mystery
0.16
Activations Density 0.028%