INDEX
    Explanations

    statements emphasizing the significance or necessity of various subjects or concepts

    New Auto-Interp
    Negative Logits
    ptrdiff
    -0.17
    abus
    -0.16
    rieg
    -0.15
    irting
    -0.15
    erty
    -0.15
    cul
    -0.15
    ild
    -0.15
    reh
    -0.15
    inqu
    -0.15
    issy
    -0.14
    POSITIVE LOGITS
     importance
    0.28
    /import
    0.24
     Importance
    0.23
     role
    0.19
     significance
    0.17
     Attached
    0.15
     Role
    0.15
    /utility
    0.15
    /effects
    0.15
    -role
    0.15
    Act Density 0.015%

    No Known Activations