INDEX
Explanations
phrases indicating the presence or absence of species or elements in various contexts
New Auto-Interp
Negative Logits
sh
-0.21
tron
-0.18
sp
-0.17
torch
-0.16
sw
-0.16
lick
-0.16
owie
-0.15
edes
-0.15
ars
-0.15
spi
-0.14
POSITIVE LOGITS
entially
0.19
Presence
0.19
Presence
0.19
ential
0.19
presence
0.19
iment
0.19
presence
0.19
گاÙĩ
0.18
Mori
0.18
onym
0.17
Activations Density 0.035%