INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Reſ
    -1.12
     Theſe
    -1.12
     Houſe
    -1.05
     Diſ
    -1.01
     Perſ
    -0.95
     ſche
    -0.95
     Inſ
    -0.94
     Anſ
    -0.94
     itſelf
    -0.93
     Conſ
    -0.92
    POSITIVE LOGITS
    tonode
    0.69
    buttonBar
    0.69
    iastes
    0.64
    AsUp
    0.63
    :+:
    0.62
     I
    0.59
    esModule
    0.57
    ArrowToggle
    0.56
     ***!
    0.56
    izr
    0.56
    Act Density 1.141%

    No Known Activations