INDEX
    Explanations

    GPT language models

    New Auto-Interp
    Negative Logits
     Municipal
    -0.10
     invaded
    -0.09
    安排
    -0.09
    Municip
    -0.09
     municipal
    -0.09
     electrician
    -0.09
     Probate
    -0.09
     सामान
    -0.09
     deducted
    -0.09
     नगर
    -0.09
    POSITIVE LOGITS
     GPT
    0.16
     pretrained
    0.16
     обуч
    0.14
    GPT
    0.13
     halluc
    0.12
     최신
    0.12
     NLP
    0.12
    trained
    0.12
     improvements
    0.12
     trained
    0.11
    Act Density 0.039%

    No Known Activations