INDEX
    Explanations

    phrases and contexts emphasizing the importance of facts and factual accuracy

    New Auto-Interp
    Negative Logits
    dings
    -0.16
    trace
    -0.15
    ati
    -0.15
    §
    -0.14
    /fw
    -0.14
    ÏĢÏĮ
    -0.14
    atic
    -0.14
    loo
    -0.14
     Silk
    -0.14
     Leer
    -0.13
    POSITIVE LOGITS
    itious
    0.23
    ually
    0.22
     facts
    0.19
    facts
    0.18
    fully
    0.18
    ìĤ¬íķŃ
    0.17
    ãĥ³ãĤº
    0.17
    oring
    0.17
    fulness
    0.16
    nel
    0.16
    Act Density 0.026%

    No Known Activations