INDEX
    Explanations

    references to the word "pur" and its variations, indicating a focus on purity or related themes

    New Auto-Interp
    Negative Logits
    uteur
    -0.16
    iveau
    -0.16
    nio
    -0.16
    arendra
    -0.16
    ÑĦеÑĢен
    -0.15
    æŃ¯
    -0.15
    ÙĦاÙħ
    -0.14
    illet
    -0.14
    ÑĨин
    -0.14
    align
    -0.14
    POSITIVE LOGITS
     Pur
    0.28
     pur
    0.27
    pur
    0.26
    poses
    0.20
     PUR
    0.19
    posed
    0.19
    POSE
    0.17
     purge
    0.17
     purification
    0.17
    pure
    0.17
    Act Density 0.012%

    No Known Activations