INDEX
Negative Logits
meaningful
0.78
désigne
0.77
worthwhile
0.76
véritables
0.72
Better
0.72
څ
0.71
injust
0.71
真正的
0.71
echte
0.70
BETTER
0.70
POSITIVE LOGITS
conservative
1.66
aggressive
1.62
aggressive
1.53
aggressively
1.35
conservative
1.34
pragmatic
1.30
lenient
1.29
permissive
1.29
Conservative
1.28
informal
1.25
Activations Density 0.893%