INDEX
    Explanations

    instances of the word "learn" in context

    New Auto-Interp
    Negative Logits
     Bare
    -0.15
    butt
    -0.15
     Fashion
    -0.15
     fashion
    -0.14
    帯
    -0.14
    oul
    -0.14
    DUCT
    -0.14
    acias
    -0.13
    iona
    -0.13
     bare
    -0.13
    POSITIVE LOGITS
    rock
    0.15
     marty
    0.15
    enin
    0.15
    åIJ¦
    0.14
    malı
    0.14
    hma
    0.14
    elling
    0.13
     pisc
    0.13
     hra
    0.13
    صØŃ
    0.13
    Act Density 0.021%

    No Known Activations