INDEX
    Explanations

    emotional expressions or statements related to personal experiences and relationships

    New Auto-Interp
    Negative Logits
     zuſammen
    -0.90
     zwiſchen
    -0.87
     deſſen
    -0.85
     queſta
    -0.84
     ſoll
    -0.83
    ſſung
    -0.83
     Weiſe
    -0.82
    ſicht
    -0.82
    <unused38>
    -0.82
    <unused12>
    -0.82
    POSITIVE LOGITS
    }$}
    0.69
    })$}
    0.62
     }}$}
    0.57
    )』
    0.57
    ")"
    0.57
    ');?>
    0.51
    ']."
    0.48
    )」
    0.48
    »»
    0.48
    "}")
    0.47
    Act Density 2.675%

    No Known Activations