INDEX
    Explanations

    references to the word "Man" and its various forms

    New Auto-Interp
    Negative Logits
    <bos>
    -1.99
    harmed
    -0.66
     resolve
    -0.65
    break
    -0.59
     get
    -0.58
    INVISIBLE
    -0.57
    -0.56
    resolve
    -0.56
      
    -0.55
     find
    -0.55
    POSITIVE LOGITS
     ftu
    1.42
     Mémoires
    1.38
     Cfr
    1.36
     Juf
    1.36
     Bartholo
    1.35
     Abbé
    1.35
     fup
    1.34
     ftre
    1.33
     fep
    1.33
     xxv
    1.33
    Act Density 0.117%

    No Known Activations