INDEX
    Explanations

    abstract opinions and qualities

    New Auto-Interp
    Negative Logits
    1.51
    ↵↵
    0.98
    <start_of_image>
    0.94
    ↵↵↵
    0.89
    。”
    0.82
    !">
    0.81
    .”)
    0.80
    !”
    0.77
    ?”
    0.76
    .:
    0.76
    POSITIVE LOGITS
     [];
    1.20
     ;,
    1.19
    ;,
    1.15
    $;
    1.11
    `;
    1.09
    >;</
    1.09
    ;",
    1.08
     {};
    1.07
    }$;
    1.06
    *;
    1.05
    Act Density 0.169%

    No Known Activations