INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     steroids
    -0.08
    hair
    -0.08
    .angle
    -0.08
    Ey
    -0.07
    Assets
    -0.07
    ాగే
    -0.07
    -0.07
     poda
    -0.07
     gastric
    -0.07
     yog
    -0.07
    POSITIVE LOGITS
    attention
    0.09
     Reform
    0.08
     abuses
    0.08
     welfare
    0.08
     interne
    0.08
    ixin
    0.08
     timeval
    0.08
     Lucy
    0.08
     Welfare
    0.08
     obedience
    0.08
    Act Density 0.003%

    No Known Activations