INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    uro
    -0.14
    iw
    -0.14
    ipo
    -0.14
    ken
    -0.14
    floor
    -0.14
    ffects
    -0.14
    essional
    -0.13
    wich
    -0.13
     floor
    -0.13
     Pow
    -0.13
    POSITIVE LOGITS
     principle
    0.25
     practice
    0.21
     presence
    0.20
     contrast
    0.19
     contrad
    0.18
    practice
    0.18
     spirit
    0.17
     analogy
    0.17
    presence
    0.17
     absence
    0.17
    Act Density 0.068%

    No Known Activations