INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.42
     pyrid
    0.39
    0.38
     chemically
    0.38
     Lipid
    0.38
    rugada
    0.38
     wrinkled
    0.37
    0.37
    <unused27>
    0.36
    ruits
    0.36
    POSITIVE LOGITS
    github
    0.87
    fmt
    0.87
     github
    0.84
     fmt
    0.79
    testing
    0.65
    encoding
    0.65
     GitHub
    0.65
    time
    0.61
     Github
    0.61
     strconv
    0.60
    Act Density 0.005%

    No Known Activations