INDEX
Explanations
emotional expressions or statements related to personal experiences and relationships
New Auto-Interp
Negative Logits
zuſammen
-0.90
zwiſchen
-0.87
deſſen
-0.85
queſta
-0.84
ſoll
-0.83
ſſung
-0.83
Weiſe
-0.82
ſicht
-0.82
<unused38>
-0.82
<unused12>
-0.82
POSITIVE LOGITS
}$}
0.69
})$}
0.62
}}$}
0.57
)』
0.57
")"
0.57
');?>
0.51
']."
0.48
)」
0.48
»»
0.48
"}")
0.47
Activations Density 2.675%