INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ulaire
    -0.08
    oria
    -0.08
    å¯
    -0.08
    ipline
    -0.07
    =-=-=-=-=-=-=-=-
    -0.07
    evi
    -0.07
    jde
    -0.07
    peria
    -0.07
    ahu
    -0.07
    APH
    -0.07
    POSITIVE LOGITS
     self
    0.08
     health
    0.07
     trait
    0.06
     pron
    0.06
    self
    0.06
     well
    0.06
    -self
    0.06
     life
    0.06
     Health
    0.06
     slice
    0.06
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.