INDEX
    Explanations

    expressions of knowledge and awareness

    New Auto-Interp
    Head Attr Weights
    0:0.12
    1:0.12
    2:0.03
    3:0.04
    4:0.04
    5:0.22
    6:0.03
    7:0.03
    8:0.14
    9:0.07
    10:0.06
    11:0.05
    Negative Logits
     termination
    -1.55
     consolidation
    -1.54
     merging
    -1.53
     remaining
    -1.51
     terminating
    -1.46
     merger
    -1.40
     rollout
    -1.40
     failure
    -1.39
     collapsing
    -1.39
     continuation
    -1.37
    POSITIVE LOGITS
    ESSION
    1.48
    idian
    1.47
    TextColor
    1.45
    ��
    1.44
    OOD
    1.43
    hour
    1.42
    hours
    1.38
    Professor
    1.36
    mosp
    1.36
    GoldMagikarp
    1.34
    Act Density 0.014%

    No Known Activations