INDEX
Explanations
dominant groups, narratives, or frequency
New Auto-Interp
Negative Logits
اً
0.49
我们
0.43
बद्ध
0.43
我们
0.42
to
0.42
बढ़ोतरी
0.42
EC
0.42
Mastery
0.42
اص
0.41
refinements
0.41
POSITIVE LOGITS
dominante
0.66
dominant
0.64
domin
0.61
dominance
0.57
domin
0.56
influence
0.54
Domin
0.53
influencia
0.52
Domin
0.51
dominating
0.50
Activations Density 0.017%