INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
á
0.69
o
0.67
y
0.66
ren
0.63
den
0.62
ji
0.62
í
0.61
et
0.61
ina
0.60
ópez
0.59
POSITIVE LOGITS
in
0.79
ING
0.77
:
0.76
ﻣ
0.70
the
0.69
ו
0.69
for
0.69
RY
0.68
أ
0.67
ﺗ
0.67
Activations Density 5.466%