INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    angel
    -0.07
    orch
    -0.07
    ran
    -0.07
     ↵            ↵
    -0.06
     ↵	↵
    -0.06
     Chr
    -0.06
    -0.06
     رز
    -0.06
    ↵        ↵
    -0.06
    ↵            ↵
    -0.06
    POSITIVE LOGITS
     tip
    0.21
    Tip
    0.15
     Tip
    0.15
    tip
    0.15
     tipping
    0.14
     tipped
    0.13
    -tip
    0.12
    _tip
    0.11
     tips
    0.11
    ip
    0.11
    Act Density 0.010%

    No Known Activations