INDEX
Explanations
phrases centered around personal identity or self-reference
first-person singular pronouns at the beginning of sentences or clauses.
New Auto-Interp
Negative Logits
autorytatywna
-0.59
DockStyle
-0.45
ControllerBase
-0.45
TextHelper
-0.43
Билгалдахарш
-0.41
ReactDOM
-0.40
_",
-0.40
nloa
-0.39
TagNumber
-0.39
}*/
-0.39
POSITIVE LOGITS
guess
0.59
digo
0.51
am
0.50
myself
0.48
admit
0.48
Tracce
0.48
hope
0.48
believe
0.47
Admit
0.47
however
0.47
Activations Density 0.179%