INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    AND
    -0.66
     Usage
    -0.64
     Meaning
    -0.63
    onsequ
    -0.62
     proof
    -0.59
     Gallery
    -0.58
    PLIC
    -0.56
     Notes
    -0.55
    ising
    -0.55
     Radius
    -0.54
    POSITIVE LOGITS
     owns
    0.95
     specialize
    0.91
     specializes
    0.89
     inhabit
    0.82
     invests
    0.81
     existed
    0.79
     violates
    0.79
     interacts
    0.79
     operates
    0.78
     participated
    0.78
    Act Density 0.114%

    No Known Activations