INDEX
    Explanations

    references to the name "Dave"

    New Auto-Interp
    Negative Logits
    loh
    -0.08
    æķ
    -0.07
    orners
    -0.07
    vanished
    -0.07
    ëĵĿ
    -0.07
    ίθ
    -0.07
     kraje
    -0.07
    isper
    -0.07
    elts
    -0.07
    Ñģион
    -0.07
    POSITIVE LOGITS
    y
    0.12
    igh
    0.07
     Patch
    0.06
    onium
    0.06
    िड
    0.06
    yb
    0.06
    ذا
    0.06
     beaut
    0.06
    IGH
    0.05
    works
    0.05
    Act Density 0.004%

    No Known Activations