INDEX
Explanations
references to groups of people or audiences in various contexts
relating to groups of people
people performing actions
New Auto-Interp
Negative Logits
ones
-0.50
pro
-0.49
the
-0.48
The
-0.45
(
-0.45
—
-0.45
among
-0.44
rov
-0.43
,
-0.41
po
-0.41
POSITIVE LOGITS
يتيمه
1.09
ValueStyle
1.00
GenerationType
0.94
propOrder
0.93
themſelves
0.87
الحره
0.86
ſtate
0.85
évaluateur
0.84
myſelf
0.83
ſhould
0.83
Activations Density 0.409%