INDEX
Explanations
articles and quantifiers in the text
New Auto-Interp
Negative Logits
tests
-0.99
Events
-0.97
alties
-0.94
grounds
-0.92
iments
-0.88
votes
-0.88
agents
-0.84
rates
-0.82
words
-0.82
Init
-0.82
POSITIVE LOGITS
silhouette
1.09
bunch
1.05
replica
1.03
swast
1.02
glimpse
1.00
plethora
1.00
miniature
1.00
cardboard
1.00
handful
0.99
suitcase
0.98
Activations Density 0.255%