INDEX
Explanations
initiate action or proposal
New Auto-Interp
Negative Logits
ل
0.52
ون
0.52
pendapat
0.49
८
0.48
امر
0.47
:'.
0.47
ﺍ
0.47
जानना
0.46
plantation
0.46
۴
0.46
POSITIVE LOGITS
illuminate
0.53
rodite
0.52
enza
0.49
nea
0.49
dor
0.48
ting
0.47
herv
0.47
creat
0.46
fulfill
0.45
a
0.44
Activations Density 0.135%