INDEX
Explanations
occurrences of the word "se" and its variations
New Auto-Interp
Negative Logits
c
-0.30
e
-0.28
cus
-0.23
y
-0.22
cip
-0.21
cene
-0.21
dum
-0.21
cj
-0.21
ãĥ³
-0.20
cis
-0.20
POSITIVE LOGITS
min
0.28
mp
0.27
mination
0.27
me
0.27
ment
0.27
men
0.27
mm
0.26
xt
0.26
man
0.25
parated
0.25
Activations Density 0.016%