INDEX
    Explanations

    high activation values associated with specific medical or scientific terminology

    New Auto-Interp
    Negative Logits
     $_[
    -0.71
     DeV
    -0.71
     Vill
    -0.69
    -0.67
    tfrac
    -0.67
     Goy
    -0.67
    er
    -0.66
     Gerr
    -0.66
    tam
    -0.65
     Osh
    -0.64
    POSITIVE LOGITS
    })*/
    1.30
    }))
    
    1.27
    ]")]
    1.23
    ]})
    1.23
    })()
    1.22
    }))
    1.19
    ']")
    1.12
    })));
    1.11
    )})
    1.08
    }])
    1.08
    Act Density 0.166%

    No Known Activations