INDEX
Explanations
violate guidelines or rules
New Auto-Interp
Negative Logits
forbindelse
0.83
னை
0.82
защи
0.79
во
0.77
汅
0.76
μαζί
0.75
৩৮
0.75
শ্রমিক
0.75
뛸
0.75
0.75
POSITIVE LOGITS
h
1.12
standards
1.01
normas
1.00
norms
0.98
norme
0.96
требований
0.91
rules
0.91
вимо
0.90
stringent
0.90
requirements
0.89
Activations Density 0.435%