INDEX
    Explanations

    instances of the word "walk" in various forms

    New Auto-Interp
    Negative Logits
     وتسجيلات
    -0.75
    dieu
    -0.61
    \{\\
    -0.57
    RetentionPolicy
    -0.57
    영어
    -0.56
     dramatic
    -0.56
    tablir
    -0.54
     Petit
    -0.54
    UTIVE
    -0.54
    reicher
    -0.53
    POSITIVE LOGITS
     walks
    1.17
    walks
    1.14
     walk
    1.08
     walked
    1.06
    walk
    1.05
     walking
    0.99
    Walk
    0.99
     WALK
    0.97
    walking
    0.89
    Walking
    0.89
    Act Density 0.154%

    No Known Activations