INDEX
    Explanations

    the word "one."

    instances of the word "one."

    New Auto-Interp
    Negative Logits
    ooks
    -0.85
    folk
    -0.71
    ypes
    -0.69
    older
    -0.64
    ="#
    -0.63
    hips
    -0.63
    atin
    -0.63
    inders
    -0.62
    osponsors
    -0.61
    lite
    -0.61
    POSITIVE LOGITS
     hundred
    0.94
     Hundred
    0.88
     thousand
    0.76
     dimensional
    0.74
     million
    0.74
     suitcase
    0.71
     sided
    0.71
     Million
    0.71
     Piece
    0.71
     embodiment
    0.69
    Act Density 0.087%

    No Known Activations