INDEX
    Explanations

    attends to instances of "know" from "know" and "mean" from "mean."

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.18
    2:0.05
    3:0.05
    4:0.10
    5:0.37
    6:0.10
    7:0.06
    Negative Logits
     EconPapers
    -0.59
    -0.51
    Autoritní
    -0.51
    InjectAttribute
    -0.51
    IsMutable
    -0.50
    AccessorTable
    -0.50
    -0.50
    ArrowToggle
    -0.50
     ModelExpression
    -0.48
    SourceChecksum
    -0.48
    POSITIVE LOGITS
     N
    0.23
     a
    0.22
     No
    0.20
     new
    0.20
    a
    0.20
     Now
    0.20
    Fl
    0.19
     Fl
    0.19
    .
    0.19
     now
    0.19
    Act Density 0.005%

    No Known Activations