INDEX
    Explanations

    sections of code that include summaries or structured comments

    New Auto-Interp
    Negative Logits
    ramer
    -0.16
    arser
    -0.15
     Contrast
    -0.14
    otre
    -0.14
    oggle
    -0.14
    ters
    -0.14
    .trade
    -0.14
    ylvania
    -0.14
    -toolbar
    -0.14
    ι
    -0.14
    POSITIVE LOGITS
    ène
    0.16
    İ
    0.15
    çĵ
    0.15
     è£
    0.15
    (parseFloat
    0.14
    ipv
    0.14
     Bab
    0.14
    cak
    0.14
    γκα
    0.14
     port
    0.14
    Act Density 0.005%

    No Known Activations