INDEX
    Explanations

    occurrences of the word "in"

    New Auto-Interp
    Negative Logits
    sofar
    -0.21
     relation
    -0.20
    relation
    -0.19
    ved
    -0.19
    reo
    -0.18
    agar
    -0.17
     Relation
    -0.17
    duct
    -0.17
     regards
    -0.16
    inder
    -0.16
    POSITIVE LOGITS
     truth
    0.27
     typical
    0.21
     true
    0.19
    truth
    0.18
     reality
    0.18
    istrovstvÃŃ
    0.16
     characteristic
    0.15
    ä¸Ģ页
    0.15
     spirit
    0.15
    _truth
    0.15
    Act Density 0.102%

    No Known Activations