INDEX
    Explanations

    instances of the word "down" in various contexts

    New Auto-Interp
    Negative Logits
    ador
    -0.17
    ocities
    -0.16
    tings
    -0.15
     lÃłng
    -0.15
    xd
    -0.15
    ings
    -0.15
     Derrick
    -0.14
    quo
    -0.14
    ted
    -0.14
    adores
    -0.14
    POSITIVE LOGITS
    playing
    0.31
    plays
    0.30
    play
    0.30
    played
    0.28
    PLAY
    0.24
    graded
    0.22
    grading
    0.22
    grades
    0.21
    -play
    0.21
     plays
    0.20
    Act Density 0.018%

    No Known Activations