INDEX
Explanations
phrases or quantities related to measurements and dimensions
New Auto-Interp
Negative Logits
gin
-0.18
ght
-0.15
aval
-0.15
bor
-0.14
indr
-0.13
ÙĪØµ
-0.13
اط
-0.13
cola
-0.13
ney
-0.13
Ùij
-0.13
POSITIVE LOGITS
ths
0.20
ÑģÑı
0.19
istrator
0.18
ed
0.17
-plus
0.17
imals
0.15
leine
0.15
-sama
0.15
/-
0.15
theless
0.14
Activations Density 0.049%