INDEX
    Explanations

    specific colors and keywords related to coding contexts

    New Auto-Interp
    Negative Logits
    ;';
    -0.84
    }],
    
    -0.83
    ]',
    -0.82
    }}">
    -0.82
    ]').
    -0.77
     />';
    -0.76
    ),”
    -0.75
    }'.
    -0.75
    ]));
    
    -0.75
    ;">
    
    -0.75
    POSITIVE LOGITS
    ",
    0.63
    0.53
    "
    0.52
    ","
    0.49
    ”,
    0.46
    ")
    0.44
    ',
    0.41
    ''
    0.38
    0.37
    \"
    0.35
    Act Density 0.728%

    No Known Activations