INDEX
Explanations
latex document setup and styling
New Auto-Interp
Negative Logits
"})
0.56
""))
0.52
")))
0.50
"});
0.50
())))
0.46
Swezey
0.45
")}}
0.43
."))
0.43
())),
0.43
')))
0.43
POSITIVE LOGITS
]{1.10
={0.78
=
0.76
!]
0.73
}]{0.72
]{\0.70
]=
0.70
)]{0.68
}{0.63
.]:
0.61
Activations Density 0.003%