INDEX
    Explanations

    requests for audience feedback and thoughts

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.01
    2:0.09
    3:0.11
    4:0.14
    5:0.03
    6:0.06
    7:0.22
    8:0.04
    9:0.05
    10:0.06
    11:0.10
    Negative Logits
     faked
    -1.24
    ties
    -1.23
     suits
    -1.22
    pred
    -1.18
     whiff
    -1.17
     dinosaurs
    -1.15
     videot
    -1.14
     vanished
    -1.12
     satellites
    -1.12
    ��極
    -1.12
    POSITIVE LOGITS
     clarification
    1.45
     educate
    1.42
     inquire
    1.42
     accordingly
    1.41
    zai
    1.40
     cautiously
    1.40
    arse
    1.39
     opin
    1.38
     someday
    1.38
     enthusi
    1.38
    Act Density 0.004%

    No Known Activations