INDEX
    Explanations

    book titles and citations

    New Auto-Interp
    Negative Logits
     Out
    0.46
     meant
    0.41
     Right
    0.39
     meaning
    0.39
     True
    0.38
     Meaning
    0.38
     it
    0.38
     all
    0.37
     Meanwhile
    0.37
     Hungry
    0.37
    POSITIVE LOGITS
    :
    0.57
     libro
    0.55
    :...
    0.54
     boek
    0.54
     :
    0.52
    ?:
    0.50
     книга
    0.50
     kitabı
    0.50
    0.50
     livro
    0.49
    Act Density 0.006%

    No Known Activations