INDEX
Explanations
the word "one" surrounded by different context and phrases
New Auto-Interp
Negative Logits
ooks
-0.83
ories
-0.75
inders
-0.74
osponsors
-0.71
hips
-0.71
respective
-0.67
actions
-0.65
ourn
-0.63
emies
-0.62
folk
-0.61
POSITIVE LOGITS
hundred
1.20
Hundred
1.14
thousand
1.01
dimensional
0.89
thing
0.88
Thousand
0.85
sided
0.85
month
0.85
hour
0.82
Piece
0.82
Activations Density 0.389%