INDEX
Explanations
mentions of personal identity and name references
the speaker or author
my self-references
New Auto-Interp
Negative Logits
ſſung
-0.77
featureID
-0.77
aarrggbb
-0.69
ſeveral
-0.68
ſol
-0.68
utafitiHapana
-0.68
ſta
-0.68
ðsíða
-0.67
ſei
-0.67
ſoll
-0.65
POSITIVE LOGITS
my
0.38
meinen
0.35
me
0.32
staw
0.30
myself
0.30
我
0.29
我的
0.28
Teilen
0.28
#!/
0.27
miei
0.26
Activations Density 0.352%