INDEX
Explanations
references to logical concepts and structures
New Auto-Interp
Negative Logits
kening
-0.17
ê²
-0.17
argas
-0.15
zsche
-0.15
ego
-0.15
udeau
-0.14
etten
-0.14
lsen
-0.14
ushima
-0.14
marg
-0.14
POSITIVE LOGITS
agma
0.17
olon
0.15
Naming
0.14
ãĤ°ãĥ«
0.14
ved
0.14
ç´Ģ
0.14
سÙĪØ¨
0.14
oron
0.14
aces
0.13
Sequential
0.13
Activations Density 0.049%