INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Hots
    -0.66
    abi
    -0.65
    azeera
    -0.65
    ting
    -0.65
    ciating
    -0.65
    umbnail
    -0.63
    ortium
    -0.62
    ontent
    -0.62
    thood
    -0.61
    ency
    -0.60
    POSITIVE LOGITS
    door
    1.24
    bell
    1.15
    steps
    1.06
    holes
    1.04
     doors
    1.00
     door
    0.99
    hole
    0.95
     opener
    0.95
     Door
    0.95
    doors
    0.91
    Act Density 0.012%

    No Known Activations