INDEX
    Explanations

    references to text formatting, especially focusing on the word "format"

    mentions of different text formats and formatting instructions

    New Auto-Interp
    Negative Logits
    roma
    -0.85
    doms
    -0.80
    arma
    -0.75
    hiro
    -0.72
    worth
    -0.70
    guard
    -0.70
    nee
    -0.68
    ĺħ
    -0.67
     Michele
    -0.66
    vironment
    -0.66
    POSITIVE LOGITS
    ters
    1.04
    ting
    0.83
    atted
    0.82
     format
    0.79
    tered
    0.77
    tering
    0.74
     formats
    0.74
    etter
    0.72
    aldehyde
    0.71
    furt
    0.70
    Act Density 0.037%

    No Known Activations