INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ana
    -1.07
    ANA
    -0.99
     gap
    -0.82
     CreateTagHelper
    -0.77
    gap
    -0.77
    AndEndTag
    -0.75
    OGND
    -0.74
     GOLDEN
    -0.73
    Golden
    -0.71
     Cæsar
    -0.71
    POSITIVE LOGITS
     T
    0.56
     W
    0.53
     Ba
    0.52
     Lu
    0.50
     Ad
    0.49
     G
    0.49
     Pan
    0.48
     Ra
    0.48
     Ab
    0.47
     Br
    0.47
    Act Density 0.281%

    No Known Activations