INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Goldberg
    -0.15
     Amit
    -0.15
    ounsel
    -0.14
    273
    -0.14
    967
    -0.14
    876
    -0.14
    utton
    -0.14
    lland
    -0.13
    oux
    -0.13
    228
    -0.13
    POSITIVE LOGITS
    оÑĢалÑĮ
    0.16
    erness
    0.16
    arness
    0.16
    IGHL
    0.16
    bilt
    0.15
    adu
    0.15
     groupBox
    0.15
    gency
    0.14
    ActionCreators
    0.14
    ãĥ³ãĤ¿
    0.14
    Act Density 0.011%

    No Known Activations