INDEX
Explanations
alleviating actions and effects
New Auto-Interp
Negative Logits
den
0.42
jata
0.41
adlı
0.41
Yaman
0.40
ян
0.40
Prab
0.40
livres
0.39
মাতৃ
0.39
Cham
0.39
のリ
0.39
POSITIVE LOGITS
alleviating
0.61
allevi
0.59
interventi
0.56
alleviate
0.56
alleviation
0.55
actuation
0.51
ដើម្បី
0.49
effecting
0.48
impactos
0.48
pitching
0.45
Activations Density 0.004%