INDEX
Explanations
occurrences of the word 'W' along with various prepositions and conjunctions
New Auto-Interp
Negative Logits
bons
-0.15
h
-0.15
as
-0.15
loff
-0.15
rava
-0.14
repro
-0.14
ytt
-0.14
urls
-0.14
alls
-0.14
ksen
-0.14
POSITIVE LOGITS
abi
0.22
spin
0.20
Indi
0.19
raz
0.19
roc
0.18
yr
0.18
ype
0.18
aha
0.18
cale
0.18
iel
0.18
Activations Density 0.019%