INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.06
    2:0.07
    3:0.07
    4:0.10
    5:0.08
    6:0.07
    7:0.08
    8:0.07
    9:0.08
    10:0.10
    11:0.08
    Negative Logits
     offending
    -1.65
    lessly
    -1.57
     ********************************
    -1.56
     unavoidable
    -1.52
     indisc
    -1.52
     safely
    -1.51
    eware
    -1.51
    ardless
    -1.51
     containing
    -1.46
    uously
    -1.45
    POSITIVE LOGITS
    Flight
    2.08
    shapeshifter
    1.76
    GPU
    1.72
    1.68
    rists
    1.67
     Robotics
    1.66
    Interest
    1.66
    rius
    1.62
     Dynamics
    1.61
    mosp
    1.61
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.