INDEX
Explanations
words or phrases associated with prepositions
New Auto-Interp
Negative Logits
uction
-0.20
v
-0.19
pre
-0.17
c
-0.17
uku
-0.17
inn
-0.16
ext
-0.15
uctions
-0.15
uct
-0.15
rc
-0.15
POSITIVE LOGITS
iminary
0.21
ursors
0.18
bÃŃ
0.17
byter
0.16
viously
0.16
linger
0.16
umpt
0.16
mium
0.16
uve
0.15
VIOUS
0.15
Activations Density 0.033%