INDEX
Explanations
mentions of the number "two."
the word "two" in various contexts
New Auto-Interp
Negative Logits
ugu
-0.80
asta
-0.74
amaru
-0.71
ubs
-0.70
rolet
-0.69
Caption
-0.68
rir
-0.68
aukee
-0.66
iggins
-0.66
awaru
-0.64
POSITIVE LOGITS
thirds
1.50
halves
1.06
dozen
1.02
weeks
1.02
hundred
0.95
fold
0.94
teen
0.93
decades
0.90
een
0.86
teenth
0.86
Activations Density 0.103%