INDEX
    Explanations

    mentions of sugar, whether in a negative context (sugar-free) or referring to its effects

    New Auto-Interp
    Negative Logits
    orial
    -0.17
    RLF
    -0.16
    eh
    -0.15
    .dds
    -0.15
    .dtd
    -0.15
    sa
    -0.15
    sing
    -0.15
    sel
    -0.15
    mgr
    -0.15
    ek
    -0.14
    POSITIVE LOGITS
     cane
    0.26
    coat
    0.26
    CRM
    0.23
    crm
    0.23
    Cube
    0.19
     refin
    0.19
    imoto
    0.18
    å°¿
    0.18
     Zucker
    0.18
     daddy
    0.18
    Act Density 0.009%

    No Known Activations