INDEX
Explanations
occurrences of the number "two" in different contexts
instances of the word "two"
New Auto-Interp
Negative Logits
ugu
-0.76
asta
-0.72
awaru
-0.72
amaru
-0.69
ubs
-0.69
Fed
-0.67
rir
-0.66
ysical
-0.66
aukee
-0.63
uffer
-0.63
POSITIVE LOGITS
thirds
1.42
dozen
1.00
halves
0.98
weeks
0.95
hundred
0.88
teen
0.88
een
0.88
fold
0.86
thirds
0.86
decades
0.81
Activations Density 0.075%