INDEX
Explanations
names of places or people starting with the letter "W"
end-of-text markers, signifying the conclusion of a document or passage
New Auto-Interp
Negative Logits
unpre
-0.80
Prelude
-0.73
bottleneck
-0.68
flares
-0.66
gratification
-0.64
manifold
-0.64
apprehension
-0.63
psy
-0.63
Metallic
-0.62
locality
-0.62
POSITIVE LOGITS
ITNESS
1.33
restling
1.29
orthy
1.28
alking
1.25
OW
1.25
atson
1.24
ORD
1.24
esley
1.24
ITCH
1.24
ALK
1.23
Activations Density 0.033%