INDEX
    Explanations

    instances of high activation values, indicating significant emphasis or importance in the text

    New Auto-Interp
    Negative Logits
    agne
    -0.18
    uyá»ĩn
    -0.15
    adera
    -0.14
    ãĥ³ãĤº
    -0.13
    %%%%%%%%
    -0.13
    #ad
    -0.13
    ausal
    -0.13
    ubb
    -0.13
    urry
    -0.13
     sic
    -0.12
    POSITIVE LOGITS
    untu
    0.16
    ForEach
    0.14
    à¥įवव
    0.14
    аÑĢÑħ
    0.14
    aurus
    0.14
    foon
    0.14
    uptools
    0.14
    809
    0.14
    _ctor
    0.14
    bih
    0.13
    Act Density 0.035%

    No Known Activations