INDEX
    Explanations

    Non-English fragments

    New Auto-Interp
    Negative Logits
    -0.08
    🙎
    -0.07
     Eug
    -0.07
     nec
    -0.07
    בוע
    -0.07
    amous
    -0.07
     MOST
    -0.07
    けど
    -0.07
    CONS
    -0.07
    Knowing
    -0.07
    POSITIVE LOGITS
    Euro
    0.07
     galer
    0.07
    0.07
    两只
    0.06
    0.06
    		        
    0.06
    liament
    0.06
    0.06
     مجر
    0.06
    _ix
    0.06
    Act Density 0.108%

    No Known Activations