INDEX
Negative Logits
Replace
0.53
rewrite
0.45
Analysis
0.44
undermines
0.43
obfusc
0.43
preprocessing
0.43
Error
0.42
replace
0.42
Theorem
0.42
Translation
0.42
POSITIVE LOGITS
budget
0.66
interesses
0.64
予算
0.64
preferencias
0.62
бюджет
0.61
intereses
0.60
tercih
0.59
buget
0.59
Budget
0.58
여행
0.56
Activations Density 0.190%