INDEX
    Explanations

    phrases indicating understanding or comprehension

    instances of the word "understands."

    New Auto-Interp
    Negative Logits
    drop
    -0.61
    bearing
    -0.61
     resulting
    -0.60
     «
    -0.59
     gra
    -0.59
     Boll
    -0.58
     bearing
    -0.58
     spring
    -0.58
     dubious
    -0.57
     random
    -0.57
    POSITIVE LOGITS
     understands
    3.53
     knows
    1.89
     understood
    1.87
     understand
    1.80
     realizes
    1.77
     recognizes
    1.73
     believes
    1.60
     learns
    1.58
     Understand
    1.52
     accepts
    1.51
    Act Density 0.013%

    No Known Activations