INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.12
    2:0.08
    3:0.08
    4:0.09
    5:0.07
    6:0.07
    7:0.06
    8:0.08
    9:0.07
    10:0.07
    11:0.08
    Negative Logits
    Interstitial
    -1.82
    verage
    -1.77
    ulative
    -1.68
    -1.65
    ה
    -1.62
    Measure
    -1.60
    ceed
    -1.59
    brance
    -1.56
    feld
    -1.56
    earth
    -1.56
    POSITIVE LOGITS
    gif
    1.92
    zu
    1.84
     Zo
    1.83
     Rasm
    1.76
    joy
    1.75
     Roose
    1.72
     Helic
    1.69
     QR
    1.67
     sleepy
    1.64
     Pie
    1.64
    Act Density 0.000%

    No Known Activations