INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    female
    -0.08
    uang
    -0.07
    _extraction
    -0.06
     возраст
    -0.06
     년도별
    -0.06
     신규
    -0.06
     synonym
    -0.06
     Exhaust
    -0.06
    ็บ
    -0.06
    248
    -0.06
    POSITIVE LOGITS
     peace
    0.17
     Peace
    0.17
    Peace
    0.11
     reconcile
    0.08
    0.07
     Message
    0.07
     reconciliation
    0.07
     hòa
    0.07
     السلام
    0.06
    Speech
    0.06
    Act Density 0.007%

    No Known Activations