INDEX
Explanations
concepts related to familial connections and historical significance
New Auto-Interp
Negative Logits
ober
-0.16
iclass
-0.14
urum
-0.14
ìĨĶ
-0.14
-------------------------------------------------------------------------
-0.14
emics
-0.13
ocas
-0.13
upos
-0.13
ASF
-0.13
olec
-0.13
POSITIVE LOGITS
instead
0.26
otherwise
0.23
Otherwise
0.23
вмеÑģÑĤ
0.22
Otherwise
0.22
instead
0.20
Instead
0.20
OTHERWISE
0.19
responsible
0.19
Instead
0.19
Activations Density 0.025%