INDEX
Explanations
phrases and words emphasizing self-reference and identity in various contexts
New Auto-Interp
Negative Logits
iastes
-0.77
uſed
-0.65
Monfieur
-0.63
بوابة
-0.61
lätt
-0.60
whoſe
-0.60
CURIAM
-0.60
sauvages
-0.59
hunne
-0.59
Chriftian
-0.59
POSITIVE LOGITS
itself
1.47
itself
1.35
Itself
1.29
本身
0.94
самого
0.88
herself
0.87
himself
0.87
themselves
0.85
themselves
0.84
Сам
0.81
Activations Density 0.197%