INDEX
    Explanations

    mentions of formal language or classifications in descriptions

    New Auto-Interp
    Negative Logits
     Lucius
    -0.56
    gu
    -0.46
     biling
    -0.46
     Tiberius
    -0.46
    äu
    -0.45
     fluo
    -0.45
     relais
    -0.43
     Sully
    -0.42
    ıyors
    -0.42
     Giovanna
    -0.42
    POSITIVE LOGITS
     nor
    1.15
    而是
    0.92
    nor
    0.87
     melainkan
    0.87
     CreateTagHelper
    0.82
    むしろ
    0.82
     sondern
    0.77
     بلکه
    0.75
     Nor
    0.72
     tampoco
    0.70
    Act Density 2.935%

    No Known Activations