INDEX
Explanations
the word "one" as a common denominator
instances of the word "one."
New Auto-Interp
Negative Logits
oof
-0.67
actionGroup
-0.67
ooks
-0.66
ories
-0.65
akings
-0.64
osponsors
-0.64
emies
-0.63
photos
-0.63
thumbnails
-0.62
actions
-0.62
POSITIVE LOGITS
hundred
1.01
wonders
0.93
assumes
0.90
thousand
0.87
Hundred
0.87
thing
0.84
learns
0.79
glance
0.76
cannot
0.75
sided
0.74
Activations Density 0.122%