INDEX
Explanations
proper nouns or named entities
instances of the letter 'W' in various contexts
New Auto-Interp
Negative Logits
succeeding
-0.77
arial
-0.77
displayText
-0.73
unpre
-0.70
revived
-0.68
EStream
-0.67
seiz
-0.65
illary
-0.65
milo
-0.65
unarmed
-0.64
POSITIVE LOGITS
atts
1.19
ITNESS
1.15
edge
1.10
atson
1.07
OW
1.07
ashington
1.06
OOD
1.05
restling
1.05
itness
1.03
TF
1.00
Activations Density 0.042%