INDEX
    Explanations

    sections detailing numerical results and findings in scientific papers

    New Auto-Interp
    Negative Logits
     hypno
    -0.43
     Roskov
    -0.41
    flip
    -0.39
     grayscale
    -0.38
    hyd
    -0.37
    WriteAttribute
    -0.37
    wol
    -0.37
     detox
    -0.36
    poll
    -0.36
    fantasy
    -0.36
    POSITIVE LOGITS
     betweenstory
    0.66
     Arbeiten
    0.64
     Erkenntnisse
    0.62
     results
    0.61
     resultaten
    0.60
     arbejde
    0.60
     arbete
    0.60
     Ergebnisse
    0.57
     trabajos
    0.57
     hasil
    0.56
    Act Density 0.071%

    No Known Activations