INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     \""
    -0.07
     written
    -0.07
    brero
    -0.07
    487
    -0.07
    ocratic
    -0.07
     rotation
    -0.06
    -circle
    -0.06
    BLUE
    -0.06
     stretching
    -0.06
     hue
    -0.06
    POSITIVE LOGITS
     passenger
    0.09
     passengers
    0.08
    abs
    0.07
    ibbon
    0.07
     Passenger
    0.07
    aise
    0.07
     Sens
    0.07
     embar
    0.07
    ovány
    0.07
     tháng
    0.07
    Act Density 0.004%

    No Known Activations