INDEX
    Explanations

    references to authors and manuscripts related to academic work

    New Auto-Interp
    Negative Logits
     myſelf
    -0.59
     verksamhet
    -0.56
     juſ
    -0.56
     rodríguez
    -0.54
     ſever
    -0.52
     faſt
    -0.51
     ſta
    -0.50
     paſſ
    -0.49
     fernández
    -0.49
     färg
    -0.49
    POSITIVE LOGITS
    ç
    0.56
     theyre
    0.54
    BO
    0.53
    Authors
    0.52
     youre
    0.50
    bo
    0.50
     thats
    0.49
    ArrowToggle
    0.47
    期刊论文
    0.47
    談社
    0.47
    Act Density 0.247%

    No Known Activations