INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (Y
    -0.07
     seine
    -0.06
    -0.06
    +\
    -0.06
     Pt
    -0.06
    (locale
    -0.06
     Gregory
    -0.06
    Rh
    -0.06
    _core
    -0.06
    ables
    -0.06
    POSITIVE LOGITS
    -shaped
    0.10
    omin
    0.07
    okin
    0.07
    shaled
    0.07
    -colored
    0.07
    -themed
    0.07
     shaped
    0.07
     Shape
    0.07
     parked
    0.07
    -pack
    0.06
    Act Density 0.009%

    No Known Activations