INDEX
    Explanations

    dialogue and expressions of agreement or acknowledgment

    New Auto-Interp
    Negative Logits
     je
    -0.15
    owler
    -0.15
    reason
    -0.14
    buzz
    -0.14
    ream
    -0.14
    Ŀ
    -0.14
    ins
    -0.13
    боÑĤ
    -0.13
     cush
    -0.13
     Snape
    -0.13
    POSITIVE LOGITS
     reply
    0.24
    Reply
    0.21
     replies
    0.20
    reply
    0.20
    egg
    0.19
    .reply
    0.19
     Reply
    0.18
    ÑĪа
    0.18
     replied
    0.18
    _reply
    0.17
    Act Density 0.563%

    No Known Activations