INDEX
Explanations
occurrences of the character "l."
New Auto-Interp
Negative Logits
elim
-0.74
scrim
-0.70
cob
-0.69
decomp
-0.68
convict
-0.67
glim
-0.66
simul
-0.65
pyramid
-0.65
reprodu
-0.62
scrut
-0.62
POSITIVE LOGITS
s
1.28
ski
1.12
tsy
1.06
tal
1.06
tre
1.04
sin
0.99
til
0.98
sky
0.94
ship
0.93
thy
0.93
Activations Density 0.086%