INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    glomer
    -0.80
    rontal
    -0.75
    ersive
    -0.73
    ities
    -0.70
    ione
    -0.67
    naires
    -0.66
    arcer
    -0.63
     Horowitz
    -0.63
    abwe
    -0.62
    ï¸ı
    -0.62
    POSITIVE LOGITS
    Hug
    1.08
    frog
    1.03
     canopy
    1.00
    beard
    0.98
    house
    0.94
     stump
    0.94
    yard
    0.91
    bank
    0.90
    Node
    0.89
    houses
    0.87
    Act Density 0.043%

    No Known Activations