INDEX
    Explanations

    adverbs that convey surprise or unexpectedness

    the word "surprisingly" and its variants to highlight unexpected results or observations

    New Auto-Interp
    Negative Logits
    icip
    -0.78
    arta
    -0.72
    flight
    -0.71
    icipated
    -0.68
    lord
    -0.62
    gang
    -0.62
     Players
    -0.62
    orem
    -0.60
    rike
    -0.60
    uese
    -0.60
    POSITIVE LOGITS
    beit
    0.82
     situated
    0.80
     LIMITED
    0.79
     absent
    0.78
     impressive
    0.71
     shaped
    0.71
     uncomfortable
    0.70
    STEM
    0.69
     Pengu
    0.68
     dull
    0.68
    Act Density 0.047%

    No Known Activations