INDEX
Explanations
accents and other languages
New Auto-Interp
Negative Logits
is
0.73
it
0.70
on
0.67
It
0.64
obat
0.63
PTSD
0.62
ুরী
0.61
dosen
0.61
gourd
0.57
an
0.56
POSITIVE LOGITS
ون
0.82
ه
0.76
áno
0.75
á
0.75
ز
0.72
ні
0.71
تش
0.71
ет
0.71
é
0.70
تين
0.69
Activations Density 0.000%