INDEX
    Explanations

    identity, existence, feelings

    New Auto-Interp
    Negative Logits
     vests
    0.50
     advertisements
    0.46
     hamburgers
    0.46
     soybeans
    0.45
     necessitated
    0.43
     advertising
    0.43
     strollers
    0.41
     Cent
    0.40
     showrooms
    0.40
     Augusta
    0.40
    POSITIVE LOGITS
     stesso
    0.54
     stessa
    0.51
     włas
    0.51
     fratello
    0.51
    identité
    0.50
     чувства
    0.49
    Self
    0.49
    esistenza
    0.48
     스스로
    0.46
     když
    0.45
    Act Density 0.001%

    No Known Activations