INDEX
Explanations
phrases related to self-reference and existential concepts
New Auto-Interp
Negative Logits
OGND
-0.95
NameInMap
-0.90
Majefty
-0.84
Yud
-0.84
richTextPanel
-0.83
voorbeeld
-0.75
Geplaatst
-0.74
actualité
-0.74
Prist
-0.72
виправивши
-0.69
POSITIVE LOGITS
se
1.02
להת
0.93
haberse
0.90
sich
0.83
Se
0.82
Mc
0.75
'},
0.74
се
0.74
się
0.73
'),
0.72
Activations Density 0.032%