INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.07
    2:0.07
    3:0.07
    4:0.09
    5:0.08
    6:0.09
    7:0.07
    8:0.09
    9:0.08
    10:0.07
    11:0.07
    Negative Logits
     Intercept
    -1.83
     undercover
    -1.61
     capt
    -1.60
     Investigations
    -1.59
     infring
    -1.59
     compr
    -1.55
     expose
    -1.51
     transl
    -1.49
     downloaded
    -1.48
     insert
    -1.44
    POSITIVE LOGITS
    ecause
    2.39
    ebted
    2.02
    Û
    1.89
    JUST
    1.85
    BLIC
    1.83
    TRUMP
    1.81
     Dhabi
    1.75
    isans
    1.67
     Buddh
    1.66
    metics
    1.65
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.