INDEX
Explanations
phrases that describe items or concepts using comparisons to familiar structures or forms
ending in like
New Auto-Interp
Negative Logits
hozzá
-0.36
disait
-0.30
R
-0.30
Rate
-0.30
autonomie
-0.29
.
-0.29
Rate
-0.29
foul
-0.29
(
-0.28
free
-0.28
POSITIVE LOGITS
betweenstory
0.71
kasarigan
0.71
enterOuterAlt
0.71
itſelf
0.66
LabelTagHelper
0.63
<unused43>
0.62
<unused28>
0.62
<unused51>
0.62
<unused23>
0.62
<unused14>
0.62
Activations Density 0.217%