INDEX
    Explanations

    specific examples or instances within a context or scenario

    phrases that introduce examples or case studies

    New Auto-Interp
    Negative Logits
    inately
    -0.80
    essed
    -0.77
    ournal
    -0.76
     priority
    -0.70
    istance
    -0.68
    ibles
    -0.68
    imately
    -0.67
    ief
    -0.65
    quire
    -0.62
     accessory
    -0.62
    POSITIVE LOGITS
     illustrate
    1.05
     Suppose
    1.01
     illustrates
    0.98
    Example
    0.98
     illustrating
    0.95
     illust
    0.91
    example
    0.90
     Example
    0.90
     examples
    0.80
    Examples
    0.78
    Act Density 0.231%

    No Known Activations