INDEX
    Explanations

    Facebook software/AI usage

    instances where the assistant refers to itself as an AI (e.g., "As an AI language model").

    New Auto-Interp
    Negative Logits
    sten
    -0.07
    _swap
    -0.06
    люч
    -0.06
    иц
    -0.06
    etty
    -0.06
    userid
    -0.06
     protestors
    -0.06
     demonstrates
    -0.06
    зации
    -0.06
    Username
    -0.06
    POSITIVE LOGITS
     arbit
    0.07
    нівер
    0.07
    获得
    0.07
     универ
    0.07
     Turning
    0.07
    odon
    0.06
    .Word
    0.06
                                                                                                                                    
    0.06
    etchup
    0.06
     //////////
    0.06
    Act Density 0.018%

    No Known Activations