INDEX
    Explanations

    phrases related to quotations or reported speech

    mentions of a specific character or element, represented by the unique activation pattern

    New Auto-Interp
    Negative Logits
     minim
    -0.78
    OTOS
    -0.70
     scatter
    -0.68
     coffin
    -0.67
     decomp
    -0.67
     cyan
    -0.67
     coast
    -0.66
     fairy
    -0.66
     scene
    -0.66
     protective
    -0.66
    POSITIVE LOGITS
    cause
    0.91
    then
    0.89
    said
    0.88
    according
    0.88
    since
    0.85
    especially
    0.84
    ¯
    0.83
    _>
    0.79
    sure
    0.77
    yet
    0.77
    Act Density 0.181%

    No Known Activations