INDEX
    Explanations

    references to weight loss and body transformation, particularly related to "love handles."

    New Auto-Interp
    Negative Logits
    Hover
    -0.15
     Hoover
    -0.15
    edback
    -0.15
     Hover
    -0.14
    ugo
    -0.14
    AMED
    -0.14
    ABCDEFGHIJKLMNOP
    -0.14
    hoff
    -0.13
    HOOK
    -0.13
    hover
    -0.13
    POSITIVE LOGITS
     handle
    1.04
     handling
    0.95
     handles
    0.94
     Handle
    0.94
    handle
    0.92
     handled
    0.87
    -handle
    0.86
    Handle
    0.86
     handler
    0.85
    .handle
    0.84
    Act Density 0.271%

    No Known Activations