INDEX
Explanations
list items separated by periods
New Auto-Interp
Negative Logits
,
0.38
to
0.32
،
0.31
ligands
0.29
torso
0.29
women
0.29
invitations
0.29
…,
0.29
orchids
0.29
messengers
0.28
POSITIVE LOGITS
ني
0.35
።
0.33
።
0.32
appunto
0.32
畱
0.32
诨
0.32
.³
0.31
كي
0.31
െടുത്തു
0.30
难度
0.30
Activations Density 2.249%