INDEX
Explanations
references to specific entities or names starting with the letter "W"
references to the letter "W" in various contexts
New Auto-Interp
Negative Logits
succeeding
-0.69
headache
-0.66
gratification
-0.66
afore
-0.65
Prelude
-0.64
enclosed
-0.63
predators
-0.63
bottleneck
-0.61
apprehension
-0.60
Malfoy
-0.60
POSITIVE LOGITS
ITNESS
1.35
edge
1.19
INGS
1.19
ALK
1.18
ITCH
1.17
OOD
1.17
ipe
1.13
atson
1.13
ORD
1.12
idespread
1.12
Activations Density 0.038%