INDEX
Explanations
instances of the word "me" and variations related to self-reference
New Auto-Interp
Negative Logits
SourceChecksum
-0.75
новниш
-0.73
متعلقه
-0.73
oneofs
-0.58
uscht
-0.57
مرئيه
-0.57
beginnetje
-0.55
Geplaatst
-0.54
ünstig
-0.54
UnknownFieldSet
-0.54
POSITIVE LOGITS
even
2.27
even
2.10
Even
1.87
EVEN
1.83
Even
1.79
EVEN
1.72
persino
1.62
навіть
1.60
Даже
1.56
incluso
1.54
Activations Density 0.262%