INDEX
Explanations
coal, oil, gold, honey, coffee
New Auto-Interp
Negative Logits
ہ
0.93
ین
0.83
л
0.71
وک
0.71
ف
0.70
ز
0.67
ک
0.63
in
0.61
наук
0.61
друга
0.60
POSITIVE LOGITS
that
0.83
K
0.79
I
0.75
X
0.70
AY
0.69
F
0.68
you
0.65
J
0.65
O
0.64
N
0.64
Activations Density 0.398%