INDEX
    Explanations

    instances of the word "learn" in various forms

    New Auto-Interp
    Negative Logits
     Xie
    -0.53
     Halliday
    -0.51
     Carrington
    -0.51
     peper
    -0.50
    ting
    -0.50
    ing
    -0.49
     Spalding
    -0.49
    ی
    -0.49
    ICING
    -0.49
     dis
    -0.48
    POSITIVE LOGITS
    learn
    1.55
     Learn
    1.48
    Learn
    1.48
     learn
    1.40
     LEARN
    1.25
    LEARN
    1.23
     learns
    1.12
     aprende
    1.01
     aprendido
    0.88
    learned
    0.86
    Act Density 0.010%

    No Known Activations