INDEX
    Explanations

    references to specific pairs or categories of information

    New Auto-Interp
    Negative Logits
     the
    -0.48
     T
    -0.48
    /
    -0.47
     and
    -0.47
     (
    -0.47
    -0.45
    ,
    -0.44
    <i>
    -0.43
    2
    -0.43
      
    -0.42
    POSITIVE LOGITS
     queſta
    0.99
     avoient
    0.91
     plufieurs
    0.85
    approximate
    0.85
    ſammen
    0.85
     nôtre
    0.84
     Monfieur
    0.84
    accurate
    0.84
    <unused43>
    0.83
     zwiſchen
    0.83
    Act Density 0.362%

    No Known Activations