INDEX
    Explanations

    short phrases indicating importance, decision-making, and perspective

    instances of placeholder text or empty signals

    New Auto-Interp
    Negative Logits
     Instr
    -0.70
     Clarkson
    -0.69
     Hert
    -0.65
     Oval
    -0.65
     Mobil
    -0.64
     Borders
    -0.63
     Berk
    -0.62
     Ninth
    -0.62
     Wr
    -0.61
     Front
    -0.61
    POSITIVE LOGITS
     ][
    1.01
     ))
    0.88
     )]
    0.87
     )))
    0.87
    _
    0.84
     ));
    0.82
    gpu
    0.81
     ):
    0.80
     );
    0.80
     ]
    0.80
    Act Density 0.168%

    No Known Activations