INDEX
Explanations
mathematical expressions and notation, particularly those involving powers and norms
New Auto-Interp
Negative Logits
ál
-0.73
Laird
-0.62
le
-0.62
Diana
-0.58
ala
-0.58
lek
-0.58
ity
-0.57
B
-0.57
Vela
-0.55
ora
-0.54
POSITIVE LOGITS
)^{2.46
})^{1.77
)^{\1.68
)^
1.62
|^{1.56
]^{1.45
)^(
1.39
)_{1.35
})^
1.32
)|^{1.32
Activations Density 0.173%