INDEX
    Explanations

    code snippets following backticks

    New Auto-Interp
    Negative Logits
    romantic
    0.69
    0.67
     romantic
    0.61
    ….
    0.57
    rog
    0.56
    ...
    0.55
    ,...
    0.53
    xiety
    0.52
    を高
    0.51
    ill
    0.51
    POSITIVE LOGITS
     `
    1.60
     `$
    1.45
     `'
    1.35
     `<
    1.33
     `"
    1.32
     `=
    1.31
     `${
    1.27
     '$
    1.26
     `-
    1.25
     "$
    1.22
    Act Density 4.948%

    No Known Activations