INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    onders
    -0.55
     Hinton
    -0.49
     Desmond
    -0.49
    unate
    -0.49
     Griffiths
    -0.49
     delights
    -0.48
    roppo
    -0.48
     concise
    -0.48
    >`;
    -0.48
     Delight
    -0.47
    POSITIVE LOGITS
    Car
    0.75
    Vehicle
    0.75
    Cars
    0.73
    Boat
    0.71
    Automobile
    0.70
    cars
    0.70
     Cars
    0.68
    Boats
    0.68
    Airplane
    0.68
     cars
    0.67
    Act Density 0.105%

    No Known Activations