INDEX
    Explanations

    the word "one" as a common denominator

    instances of the word "one."

    New Auto-Interp
    Negative Logits
    oof
    -0.67
     actionGroup
    -0.67
    ooks
    -0.66
    ories
    -0.65
    akings
    -0.64
    osponsors
    -0.64
    emies
    -0.63
    photos
    -0.63
    thumbnails
    -0.62
    actions
    -0.62
    POSITIVE LOGITS
     hundred
    1.01
     wonders
    0.93
     assumes
    0.90
     thousand
    0.87
     Hundred
    0.87
     thing
    0.84
     learns
    0.79
     glance
    0.76
     cannot
    0.75
     sided
    0.74
    Act Density 0.122%

    No Known Activations