INDEX
Explanations
numerical values and mathematical notations
New Auto-Interp
Negative Logits
/INFO
-0.16
ellas
-0.15
.mods
-0.15
emma
-0.15
gebra
-0.15
rouch
-0.15
ektor
-0.14
ège
-0.14
.identity
-0.14
ÙĬÙĦا
-0.14
POSITIVE LOGITS
/+
0.19
/-
0.17
uster
0.16
amon
0.15
agation
0.15
syll
0.15
(-
0.15
sign
0.14
err
0.14
unit
0.14
Activations Density 0.068%