INDEX
Explanations
promoting positive or harmful outcomes
New Auto-Interp
Negative Logits
implications
0.51
Implications
0.47
intentar
0.45
:"+
0.42
Impact
0.41
determinar
0.40
Impact
0.39
imposed
0.39
meghatá
0.39
インパクト
0.39
POSITIVE LOGITS
growth
0.68
awareness
0.63
crescimento
0.58
uptake
0.57
healthy
0.57
innovation
0.55
creativity
0.55
wzrost
0.55
growth
0.54
togetherness
0.54
Activations Density 0.020%