INDEX
    Explanations

    neutrals or terms without significant activation signals

    Code, URLs, or file paths

    New Auto-Interp
    Negative Logits
    issante
    -0.46
    <eos>
    -0.44
     “
    -0.43
    -0.42
    doctype
    -0.42
    ↵↵
    -0.42
      
    -0.41
     No
    -0.41
    -0.41
     dasar
    -0.41
    POSITIVE LOGITS
    uxxxx
    0.90
     houſe
    0.73
     AppColors
    0.72
     ſche
    0.69
     الحره
    0.69
     ſind
    0.67
    Datuak
    0.67
    ſelves
    0.66
     faſt
    0.66
     Mémoires
    0.66
    Act Density 0.012%

    No Known Activations