INDEX
    Explanations

    instances of reporting or learning experiences and lessons

    New Auto-Interp
    Negative Logits
    avage
    -0.16
     Baghd
    -0.14
    igi
    -0.14
    athom
    -0.14
    ấu
    -0.14
    mav
    -0.14
    atus
    -0.14
    lav
    -0.13
    åĨ
    -0.13
     Bols
    -0.13
    POSITIVE LOGITS
     discover
    0.49
     learn
    0.48
     discovered
    0.46
     discovery
    0.44
     discovers
    0.44
     learns
    0.43
    Learn
    0.42
    learn
    0.42
    Discover
    0.42
     learned
    0.41
    Act Density 0.162%

    No Known Activations