INDEX
Explanations
promoting or fostering concepts
New Auto-Interp
Negative Logits
OU
0.43
ỏng
0.42
mengakses
0.40
requently
0.40
場合があります
0.40
estándar
0.38
ės
0.37
poteva
0.36
most
0.36
mū
0.36
POSITIVE LOGITS
encourages
0.61
encourage
0.59
promote
0.59
Encour
0.57
promote
0.57
Encourage
0.57
promotes
0.56
развитию
0.55
развитие
0.53
Promote
0.52
Activations Density 0.170%