INDEX
Explanations
statistics comparing averages across different regions or groups
New Auto-Interp
Negative Logits
ZD
-0.15
doll
-0.14
ara
-0.14
Lum
-0.14
ĽĪ
-0.14
strup
-0.14
ando
-0.14
oop
-0.14
cop
-0.13
Source
-0.13
POSITIVE LOGITS
norm
0.26
norm
0.23
norms
0.23
levels
0.21
levels
0.20
level
0.20
baseline
0.19
value
0.19
Norm
0.18
Norm
0.17
Activations Density 0.108%