INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.07
    2:0.08
    3:0.08
    4:0.08
    5:0.08
    6:0.07
    7:0.08
    8:0.07
    9:0.10
    10:0.08
    11:0.07
    Negative Logits
    tips
    -2.10
    oufl
    -1.99
    umbn
    -1.95
    estyles
    -1.93
     Inv
    -1.85
    ulner
    -1.85
    anmar
    -1.83
     Pil
    -1.82
    arij
    -1.81
     helicop
    -1.79
    POSITIVE LOGITS
     negro
    1.98
    VILLE
    1.98
    RON
    1.66
     rationality
    1.63
     correction
    1.63
     tyranny
    1.52
     Genie
    1.52
     injustice
    1.49
     irrational
    1.48
     eternal
    1.47
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.