INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Hawk
    -0.99
     hawk
    -0.84
    Hawk
    -0.80
    InjectAttribute
    -0.65
     Hawks
    -0.65
     hawks
    -0.63
     EClass
    -0.61
    assertIs
    -0.60
     HAW
    -0.57
    AndEndTag
    -0.56
    POSITIVE LOGITS
    room
    0.56
    stone
    0.49
    Trust
    0.49
    marks
    0.49
    साय
    0.49
    box
    0.49
    row
    0.47
    Marks
    0.47
    mark
    0.47
    trust
    0.47
    Act Density 0.022%

    No Known Activations