INDEX
    Explanations

    words related to communication, such as speakers, listeners, reports, and feedback

    repeated punctuation marks, particularly commas, indicating a pattern or emphasis in the text

    New Auto-Interp
    Negative Logits
    units
    -0.65
    ļéĨĴ
    -0.64
    Goal
    -0.59
    interstitial
    -0.59
    aha
    -0.59
    Override
    -0.59
    Reward
    -0.59
    Reply
    -0.57
    account
    -0.57
    Engineers
    -0.55
    POSITIVE LOGITS
     huh
    1.06
     however
    1.03
     sir
    0.87
     albeit
    0.86
     please
    0.82
     though
    0.81
     eh
    0.80
     meanwhile
    0.80
     yes
    0.73
     yeah
    0.70
    Act Density 0.463%

    No Known Activations