INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    "));
    
    -1.42
    babies
    -1.38
     does
    -1.37
    </h2>
    -1.36
     &
    -1.34
     allow
    -1.31
    							
    -1.30
     those
    -1.29
     家居
    -1.28
     will
    -1.27
    POSITIVE LOGITS
     dari
    1.70
     terke
    1.61
     olvido
    1.61
     informé
    1.60
     menyadari
    1.54
     gorro
    1.52
    ことがある
    1.50
    1.49
     botanique
    1.48
     trouverez
    1.48
    Act Density 0.024%

    No Known Activations