INDEX
    Explanations

    Instances where the assistant self-identifies or gives a disclaimer about being an AI (the model's "As an AI ..." style preface).

    New Auto-Interp
    Negative Logits
    ystone
    -0.07
     ung
    -0.06
    otre
    -0.06
     مق
    -0.06
    -0.06
    vals
    -0.06
     Γεω
    -0.06
     repairs
    -0.06
     bilg
    -0.06
    -0.06
    POSITIVE LOGITS
    (cv
    0.06
    Eth
    0.06
    \Service
    0.06
    _references
    0.06
     Eth
    0.06
    *(-
    0.06
    0.06
    0.06
    /:
    0.06
    0.06
    Act Density 0.062%

    No Known Activations