INDEX
Explanations
instances of the substring "fro" in various forms
New Auto-Interp
Negative Logits
y
-0.16
ariat
-0.15
usa
-0.15
ifu
-0.15
mits
-0.15
.Utc
-0.14
orous
-0.14
paren
-0.14
lam
-0.14
ux
-0.14
POSITIVE LOGITS
sted
0.34
lick
0.29
thing
0.26
thy
0.23
lic
0.23
thed
0.22
sts
0.20
gs
0.19
issement
0.18
strup
0.18
Activations Density 0.003%