INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    0.97
    стом
    0.82
    ח
    0.80
     a
    0.76
     הראש
    0.75
    עות
    0.69
    льский
    0.68
     emplacement
    0.67
     erreurs
    0.66
    0.64
    POSITIVE LOGITS
    :
    1.17
    u
    0.97
    Y
    0.86
    S
    0.86
    E
    0.83
    o
    0.81
    F
    0.78
    0.77
    ンジ
    0.76
    	
    0.75
    Act Density 0.001%

    No Known Activations