INDEX
Explanations
test scenarios and correctness
New Auto-Interp
Negative Logits
clonal
0.44
GRANT
0.42
Appro
0.39
приблизи
0.39
Approx
0.38
physema
0.38
оте
0.38
лье
0.38
прода
0.38
τεί
0.38
POSITIVE LOGITS
correctness
0.60
tests
0.59
symmetries
0.59
robustness
0.58
trigonometric
0.58
symmetry
0.56
tested
0.52
correctly
0.52
cases
0.51
错误
0.51
Activations Density 0.317%