INDEX
    Explanations

    phrases that convey realization and understanding

    New Auto-Interp
    Negative Logits
    ÏĮν
    -0.14
    rint
    -0.14
    εÏģγ
    -0.14
    acades
    -0.13
    arov
    -0.13
    ÑĢек
    -0.13
     |/
    -0.13
    oment
    -0.13
     ãĢľ
    -0.13
    zell
    -0.13
    POSITIVE LOGITS
     just
    0.72
    just
    0.59
     how
    0.57
     exactly
    0.47
     Just
    0.47
     JUST
    0.46
    Just
    0.44
    how
    0.43
     why
    0.42
     juste
    0.41
    Act Density 0.279%

    No Known Activations