INDEX
Explanations
up in, as a, like a, not broken, not applying
New Auto-Interp
Negative Logits
היו
0.57
صورة
0.55
እንቅስቃሴ
0.54
မျက်
0.52
필
0.52
gobiernos
0.52
ες
0.51
Δια
0.51
periodistas
0.51
ρου
0.50
POSITIVE LOGITS
an
0.66
a
0.66
in
0.64
e
0.61
f
0.61
h
0.60
ar
0.59
u
0.59
R
0.57
ed
0.56
Activations Density 0.001%