INDEX
Explanations
ratios and comparisons in the context of training or performance
New Auto-Interp
Negative Logits
еÑĢи
-0.16
еÑĢÑĸ
-0.15
Kraj
-0.15
interop
-0.14
overe
-0.14
rys
-0.14
©
-0.14
medi
-0.14
éry
-0.14
lys
-0.14
POSITIVE LOGITS
ratio
0.18
ratio
0.17
ÃŃd
0.16
ixe
0.15
contro
0.15
ixin
0.15
conto
0.14
ixa
0.14
entrant
0.14
upp
0.14
Activations Density 0.081%