INDEX
Explanations
specific words related to quantitative measurements or values
New Auto-Interp
Negative Logits
mut
-0.16
node
-0.16
ze
-0.16
Sloan
-0.16
ãĤµ
-0.16
artz
-0.15
ZE
-0.15
mu
-0.15
node
-0.14
Z
-0.14
POSITIVE LOGITS
anka
0.17
ìķĻ
0.17
/sdk
0.16
éŃ
0.16
quer
0.15
_PF
0.15
ç±
0.15
ника
0.15
Mer
0.15
agger
0.15
Activations Density 0.043%