INDEX
    Explanations

    hypotheses and definitions

    New Auto-Interp
    Negative Logits
    GPT
    0.64
     GPT
    0.63
    AI
    0.59
     ai
    0.55
    GBT
    0.55
     AI
    0.53
    gpt
    0.52
    ChatGPT
    0.51
     generative
    0.49
     ChatGPT
    0.48
    POSITIVE LOGITS
     >>>
    0.41
     >
    0.38
    0.37
    arlier
    0.37
    »
    0.36
     Brian
    0.36
     pe
    0.35
     »
    0.35
     Question
    0.35
    0.35
    Act Density 0.001%

    No Known Activations