INDEX
Explanations
the presence of segments that contain high numerical values associated with parameters or configurations
New Auto-Interp
Negative Logits
Rolf
-0.76
<()>
-0.67
0
-0.67
Betracht
-0.66
Rolf
-0.66
bg
-0.64
’?
-0.62
ipts
-0.61
Canucks
-0.59
équ
-0.59
POSITIVE LOGITS
2.07
1.51
1.48
1.46
1.40
1.21
1.17
1.11
ſelf
1.10
1.05
Activations Density 0.030%