INDEX
    Explanations

    instances of comprehension and awareness in context

    New Auto-Interp
    Negative Logits
    onta
    -0.16
    older
    -0.15
    aná
    -0.15
    usc
    -0.15
    uggy
    -0.14
    itou
    -0.14
    alaria
    -0.14
    igham
    -0.14
    abin
    -0.14
    allon
    -0.14
    POSITIVE LOGITS
     fully
    0.27
    ably
    0.26
    fully
    0.23
     completely
    0.23
     why
    0.22
    ings
    0.21
    Fully
    0.20
    为ä»Ģä¹Ī
    0.20
     about
    0.19
     better
    0.19
    Act Density 0.064%

    No Known Activations