INDEX
    Explanations

    instances of self-reflection and commentary on societal standards

    New Auto-Interp
    Negative Logits
    apon
    -0.16
    rames
    -0.15
    assin
    -0.14
    ehir
    -0.14
    chein
    -0.14
    oose
    -0.14
    ungi
    -0.14
    çe
    -0.14
    omba
    -0.14
    needle
    -0.14
    POSITIVE LOGITS
     would
    0.33
    would
    0.31
     Would
    0.28
    Would
    0.28
     wouldn
    0.27
     würde
    0.22
     serait
    0.20
     seria
    0.19
     wäre
    0.19
     Wouldn
    0.18
    Act Density 0.125%

    No Known Activations