INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     DOT
    -0.79
     dot
    -0.78
    -0.72
     क्रम
    -0.71
    طرف
    -0.70
    romas
    -0.69
    gnose
    -0.69
     μην
    -0.69
     sie
    -0.68
    -0.68
    POSITIVE LOGITS
     hook
    2.41
     hooks
    2.36
     use
    2.34
    hooks
    2.34
     Hooks
    2.14
     Hook
    2.13
    Hooks
    2.08
    Hook
    2.02
    hook
    1.98
    use
    1.89
    Act Density 0.025%

    No Known Activations