INDEX
    Explanations

    expressing curiosity to learn

    New Auto-Interp
    Negative Logits
     kwamba
    0.49
    being
    0.43
     embracing
    0.42
     admitting
    0.42
     being
    0.42
     bahwa
    0.41
     accepting
    0.40
     BEING
    0.40
    absorbing
    0.40
     cope
    0.39
    POSITIVE LOGITS
    ity
    0.54
     curry
    0.52
    ITY
    0.51
     জানতে
    0.48
     george
    0.47
     Curry
    0.47
     Understand
    0.44
     /*
    0.43
    了解
    0.43
     cur
    0.42
    Act Density 0.007%

    No Known Activations