INDEX
Explanations
indicating purpose or benefit
New Auto-Interp
Negative Logits
i
0.58
ל
0.46
י
0.45
OF
0.42
ﺭ
0.42
し
0.42
レ
0.40
ﻭ
0.40
are
0.38
の通販
0.38
POSITIVE LOGITS
๒
0.52
២
0.50
🌱
0.49
۩
0.49
😧
0.48
to
0.47
číslo
0.47
0.46
de
0.46
ong
0.46
Activations Density 0.047%