INDEX
Explanations
detect, process, measure, disable
New Auto-Interp
Negative Logits
n
0.71
use
0.68
o
0.63
r
0.63
il
0.62
u
0.61
g
0.60
pe
0.60
ir
0.59
w
0.59
POSITIVE LOGITS
diri
0.70
araştırm
0.66
فيد
0.59
lira
0.58
zar
0.57
contenido
0.57
dipende
0.56
וה
0.55
strumento
0.55
bahkan
0.55
Activations Density 0.000%