INDEX
Explanations
words that start with the letter 'o'
New Auto-Interp
Negative Logits
p
-0.21
pas
-0.20
l
-0.20
b
-0.19
rt
-0.18
m
-0.18
r
-0.18
pone
-0.18
f
-0.18
h
-0.18
POSITIVE LOGITS
lymp
0.29
'clock
0.25
vens
0.25
missions
0.23
curring
0.23
phthalm
0.23
scar
0.23
aths
0.22
regon
0.22
posite
0.21
Activations Density 0.011%