INDEX
Explanations
references to personal identity and ownership
possessive pronouns and self-references
New Auto-Interp
Negative Logits
يتيمه
-0.59
SuppressLint
-0.46
-0.46
irited
-0.45
ſche
-0.44
оригіналу
-0.43
{#-0.43
समीक्षाओं
-0.43
meisje
-0.43
ⓧ
-0.43
POSITIVE LOGITS
own
0.56
自己
0.51
自己是
0.50
MessageTagHelper
0.47
自分は
0.47
自己的
0.47
自己在
0.47
فريبيس
0.45
themselves
0.43
themselves
0.42
Activations Density 0.069%