INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.08
    2:0.08
    3:0.07
    4:0.09
    5:0.07
    6:0.07
    7:0.08
    8:0.07
    9:0.07
    10:0.09
    11:0.07
    Negative Logits
     offending
    -1.53
     peripheral
    -1.49
     predetermined
    -1.49
     specific
    -1.48
     conditional
    -1.46
     tagging
    -1.44
     appropriation
    -1.42
     mismatch
    -1.40
     intent
    -1.39
     unrelated
    -1.37
    POSITIVE LOGITS
    aug
    1.78
    apeake
    1.77
    DragonMagazine
    1.76
     Franch
    1.70
    estern
    1.66
    1.61
    spir
    1.60
    cephal
    1.56
    illin
    1.56
    terday
    1.54
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.