INDEX
Explanations
markers of structured data or notation, such as mathematical symbols or references
New Auto-Interp
Negative Logits
𝐮
-0.80
ruh
-0.78
𝐥
-0.75
ítmény
-0.75
redor
-0.73
riuscito
-0.73
pola
-0.73
bolt
-0.72
trib
-0.71
بيها
-0.70
POSITIVE LOGITS
$,
1.20
}}$,
1.04
}$,
1.03
]--;
0.99
\}$,
0.98
)}$,
0.98
$),
0.97
)$,
0.92
$).
0.91
))$.
0.90
Activations Density 0.388%