INDEX
Explanations
references to organizations, institutions, or funding sources
New Auto-Interp
Negative Logits
itſelf
-0.99
myſelf
-0.91
ſelves
-0.91
CHAPITRE
-0.90
bibfield
-0.85
themſelves
-0.85
ویکیپدیای
-0.84
ſelf
-0.83
Theſe
-0.82
raiſ
-0.82
POSITIVE LOGITS
plus
0.68
+.
0.63
+
0.58
+,
0.56
label
0.55
AV
0.54
&
0.54
AV
0.52
Plus
0.52
–
0.49
Activations Density 0.233%