INDEX
Explanations
occurrences of the letter 'w'
New Auto-Interp
Negative Logits
ayi
-0.20
x
-0.18
p
-0.17
rav
-0.17
ohl
-0.16
ay
-0.16
Stern
-0.16
r
-0.15
ish
-0.15
c
-0.15
POSITIVE LOGITS
ester
0.19
eder
0.18
sis
0.18
ickets
0.17
olley
0.17
tte
0.16
avy
0.15
ih
0.15
try
0.15
asser
0.15
Activations Density 0.023%