INDEX
    Explanations

    questions that seek information or clarification

    Questions starting with "What"

    New Auto-Interp
    Negative Logits
    <bos>
    -0.69
     suivantes
    -0.66
    StructEnd
    -0.65
    WriteTagHelper
    -0.59
    HandlerContext
    -0.58
     suivants
    -0.57
     mencionados
    -0.57
    DeleteBehavior
    -0.57
    abestanden
    -0.56
    ImageContext
    -0.55
    POSITIVE LOGITS
     kind
    1.26
     kinds
    1.13
     sort
    1.06
     type
    0.99
     types
    0.99
    kind
    0.97
     sorts
    0.94
     role
    0.87
     happens
    0.86
    sort
    0.85
    Act Density 0.150%

    No Known Activations