INDEX
Explanations
perfectly followed by positive descriptors
New Auto-Interp
Negative Logits
Below
0.45
%
0.41
Dec
0.40
Sub
0.40
West
0.39
Bast
0.38
char
0.37
care
0.37
Equality
0.37
Since
0.37
POSITIVE LOGITS
<unused2121>
0.48
kesehatan
0.46
Един
0.45
zuen
0.44
opportunità
0.44
كانت
0.44
<unused1794>
0.43
<unused2127>
0.43
न्हा
0.43
وكانت
0.43
Activations Density 0.004%