INDEX
Explanations
references to degrees of murder or other legal classifications
New Auto-Interp
Negative Logits
sink
-0.18
è¨
-0.15
ész
-0.15
duk
-0.14
dirt
-0.14
νÏĦ
-0.14
ARTH
-0.14
ande
-0.14
itesse
-0.14
carpet
-0.13
POSITIVE LOGITS
-degree
0.42
degree
0.42
degree
0.37
Degree
0.34
Degree
0.28
egree
0.26
.degree
0.26
_degree
0.26
grado
0.24
degrees
0.22
Activations Density 0.005%