INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.05
    2:0.09
    3:0.08
    4:0.07
    5:0.10
    6:0.08
    7:0.07
    8:0.08
    9:0.08
    10:0.10
    11:0.08
    Negative Logits
    ntil
    -1.92
    onomy
    -1.86
    interstitial
    -1.67
     constitu
    -1.64
     Nanto
    -1.64
    control
    -1.59
    iaries
    -1.56
    laws
    -1.55
    alf
    -1.55
    -1.54
    POSITIVE LOGITS
     Fahrenheit
    1.57
     sinking
    1.57
     spoof
    1.51
     nude
    1.46
    Untitled
    1.45
     itself
    1.44
     sketch
    1.43
     ath
    1.42
     pseudonym
    1.42
     jet
    1.40
    Act Density 0.000%

    No Known Activations