INDEX
Explanations
occurrences of the substring "ber" in words
New Auto-Interp
Negative Logits
pls
-0.16
erece
-0.15
irr
-0.15
uren
-0.15
eh
-0.15
dpi
-0.15
.labelX
-0.15
ette
-0.15
erver
-0.15
tte
-0.14
POSITIVE LOGITS
ley
0.23
iginal
0.21
LEY
0.20
Heard
0.19
gs
0.18
trand
0.17
Alert
0.17
ness
0.17
nas
0.15
ELY
0.15
Activations Density 0.014%