INDEX
Explanations
phrases related to concerns about problems and allegations in various contexts
New Auto-Interp
Negative Logits
rungsseite
-0.83
transQ
-0.82
<unused42>
-0.80
<unused23>
-0.80
<unused76>
-0.80
<unused41>
-0.79
<unused43>
-0.79
<unused28>
-0.79
[@BOS@]
-0.79
<unused8>
-0.79
POSITIVE LOGITS
подоб
0.56
such
0.52
solchen
0.52
solche
0.49
similar
0.45
solcher
0.43
这类
0.43
like
0.43
こういう
0.42
这样的
0.42
Activations Density 0.524%