INDEX
    Explanations

    references to categories or types of objects and their attributes

    New Auto-Interp
    Negative Logits
    ésultats
    -0.94
    majánló
    -0.91
    iſchen
    -0.89
    ſehen
    -0.88
     ſind
    -0.87
    ſicht
    -0.85
    <unused8>
    -0.83
    <unused14>
    -0.83
    [@BOS@]
    -0.83
    <pad>
    -0.83
    POSITIVE LOGITS
    ,
    0.50
    </h3>
    0.46
    </h4>
    0.44
     and
    0.43
    -
    0.43
    0.43
    '
    0.38
    .
    0.38
     Withers
    0.38
    </h2>
    0.37
    Act Density 1.147%

    No Known Activations