INDEX
    Explanations

    GPT, OpenAI, language models

    New Auto-Interp
    Negative Logits
    🫶
    1.13
    🥹
    1.05
    🫰
    1.05
    🫠
    1.00
    🩷
    0.98
    🫡
    0.97
    🫣
    0.96
    🫢
    0.93
    🩶
    0.91
    🫤
    0.88
    POSITIVE LOGITS
     coronavirus
    0.86
     ২০১৯
    0.78
     Coronavirus
    0.70
     Trump
    0.68
    Coronavirus
    0.59
    Trump
    0.56
     pre
    0.54
     করোনাভাই
    0.54
    coronavirus
    0.54
     ২০১৮
    0.50
    Act Density 0.036%

    No Known Activations