INDEX
Explanations
phrases or structures that describe examples or clarifications
New Auto-Interp
Negative Logits
purpoſe
-0.77
ſub
-0.67
Juifs
-0.63
ſen
-0.61
Diſ
-0.61
ſmall
-0.61
greateſt
-0.61
houſe
-0.60
Anſ
-0.59
"]);
-0.58
POSITIVE LOGITS
например
1.01
例えば
1.01
like
0.99
bijvoorbeeld
0.98
např
0.96
比如
0.95
voorbeeld
0.94
like
0.93
beispielsweise
0.93
たとえば
0.93
Activations Density 0.923%