INDEX
Explanations
words that contain the substring "ar"
New Auto-Interp
Negative Logits
ec
-0.25
y
-0.24
ene
-0.24
enden
-0.23
ent
-0.23
etti
-0.23
ey
-0.23
ek
-0.21
gers
-0.21
ela
-0.21
POSITIVE LOGITS
beiten
0.25
thur
0.24
oon
0.23
beiter
0.23
monic
0.22
riors
0.21
hyth
0.21
ctic
0.20
ctica
0.20
aptor
0.20
Activations Density 0.112%