INDEX
    Explanations

    words related to programming concepts, particularly in Python

    New Auto-Interp
    Negative Logits
    oux
    -0.16
    ign
    -0.15
    adox
    -0.14
    alted
    -0.14
    igned
    -0.14
    rias
    -0.14
    pill
    -0.14
    otos
    -0.14
    commons
    -0.14
    mouth
    -0.13
    POSITIVE LOGITS
     hello
    0.19
     Hello
    0.17
    .say
    0.16
    >Hello
    0.16
    hello
    0.16
    Hello
    0.16
    _HEL
    0.16
    _hello
    0.16
    42
    0.15
     Summers
    0.15
    Act Density 0.213%

    No Known Activations