INDEX
    Explanations

    punctuation marks, particularly quotation marks and apostrophes

    New Auto-Interp
    Negative Logits
     Carol
    -0.15
    riv
    -0.15
     Gent
    -0.15
    eto
    -0.15
     Ward
    -0.14
     infl
    -0.14
    andas
    -0.14
     Barb
    -0.14
     perk
    -0.14
     stud
    -0.13
    POSITIVE LOGITS
     %
    0.24
    format
    0.24
    .format
    0.20
     format
    0.19
    æł¼å¼ı
    0.17
     %(
    0.17
     Format
    0.17
    -format
    0.16
     formats
    0.16
    arg
    0.16
    Act Density 0.004%

    No Known Activations