INDEX
Explanations
exploitative or offensive content
New Auto-Interp
Negative Logits
slightly
0.52
একটু
0.46
facilitation
0.45
కొన్ని
0.44
ඔබේ
0.43
слегка
0.43
:)
0.43
légèrement
0.43
facilitate
0.43
trochu
0.43
POSITIVE LOGITS
Major
0.51
No
0.44
林
0.43
major
0.43
MAJOR
0.43
ieving
0.42
Initial
0.42
THIS
0.41
Finally
0.41
THE
0.40
Activations Density 1.747%