INDEX
Explanations
the word "one."
instances of the word "one."
New Auto-Interp
Negative Logits
ooks
-0.85
folk
-0.71
ypes
-0.69
older
-0.64
="#
-0.63
hips
-0.63
atin
-0.63
inders
-0.62
osponsors
-0.61
lite
-0.61
POSITIVE LOGITS
hundred
0.94
Hundred
0.88
thousand
0.76
dimensional
0.74
million
0.74
suitcase
0.71
sided
0.71
Million
0.71
Piece
0.71
embodiment
0.69
Activations Density 0.087%