INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Gallimard
    -0.52
     Muda
    -0.51
     Krone
    -0.50
     choque
    -0.50
     Mound
    -0.49
     ſever
    -0.48
    Scout
    -0.48
    那里
    -0.47
     Huerta
    -0.47
     ape
    -0.47
    POSITIVE LOGITS
    </
    0.95
    ("</
    0.81
    '</
    0.78
     "</
    0.76
     </
    0.74
    .'</
    0.73
    "</
    0.72
     '</
    0.69
    ----</
    0.69
    )</
    0.67
    Act Density 0.068%

    No Known Activations