INDEX
Explanations
theologian, -ologist, -ological
New Auto-Interp
Negative Logits
c
1.02
x
0.78
don
0.74
kenalkan
0.70
d
0.70
dır
0.69
dW
0.68
dalam
0.67
dum
0.63
dah
0.63
POSITIVE LOGITS
is
1.02
م
0.98
as
0.97
是
0.96
ت
0.94
ing
0.91
to
0.87
ic
0.85
ig
0.83
جي
0.82
Activations Density 0.001%