INDEX
    Explanations

    phrases related to communication and updates

    New Auto-Interp
    Negative Logits
    boa
    -0.18
    ghi
    -0.16
    rawer
    -0.14
    INGLE
    -0.14
    Mounted
    -0.14
    uib
    -0.14
    trand
    -0.13
    .Unsupported
    -0.13
    ollen
    -0.13
    olle
    -0.13
    POSITIVE LOGITS
     informed
    0.44
     aware
    0.42
    aware
    0.41
     awareness
    0.37
    -aware
    0.35
     Awareness
    0.34
    Aware
    0.34
     Aware
    0.34
     notified
    0.29
     priv
    0.28
    Act Density 0.131%

    No Known Activations