INDEX
    Explanations

    phrases that describe items or concepts using comparisons to familiar structures or forms

    New Auto-Interp
    Negative Logits
     hozzá
    -0.36
     disait
    -0.30
    R
    -0.30
    Rate
    -0.30
    autonomie
    -0.29
     .
    -0.29
     Rate
    -0.29
     foul
    -0.29
     (
    -0.28
     free
    -0.28
    POSITIVE LOGITS
     betweenstory
    0.71
     kasarigan
    0.71
    enterOuterAlt
    0.71
     itſelf
    0.66
    LabelTagHelper
    0.63
    <unused43>
    0.62
    <unused28>
    0.62
    <unused51>
    0.62
    <unused23>
    0.62
    <unused14>
    0.62
    Act Density 0.217%

    No Known Activations