INDEX
Explanations
elements related to organizational or structural components
New Auto-Interp
Negative Logits
zelf
-0.22
apses
-0.15
isor
-0.15
andalone
-0.15
gable
-0.14
/is
-0.14
ãĤĤ
-0.14
åºľ
-0.14
sel
-0.14
ugas
-0.14
POSITIVE LOGITS
/-
0.31
++++++++++++++++++++++++++++++++
0.29
++++
0.25
++↵
0.22
++++++++
0.21
++++++++++++++++
0.21
ieurs
0.18
holding
0.18
ça
0.17
++.
0.17
Activations Density 0.118%