INDEX
    Explanations

    descriptive phrases that compare actions or qualities, often emphasizing effectiveness and moral implications

    New Auto-Interp
    Negative Logits
    ucha
    -0.16
    alach
    -0.16
    krv
    -0.14
    ->___
    -0.14
    auge
    -0.14
    tet
    -0.14
    rox
    -0.14
    ynthia
    -0.14
    иÑĤÑĥ
    -0.14
     functioning
    -0.13
    POSITIVE LOGITS
     justice
    0.27
    cket
    0.24
     things
    0.23
     damage
    0.20
     Justice
    0.20
    justice
    0.20
     thing
    0.19
     Damage
    0.19
     work
    0.19
    Justice
    0.19
    Act Density 0.243%

    No Known Activations