INDEX
    Explanations

    languages and grammar

    New Auto-Interp
    Negative Logits
    disk
    -0.08
    _disk
    -0.08
    Disk
    -0.08
     Hospital
    -0.07
     contenant
    -0.07
     curated
    -0.07
     Disk
    -0.07
    .disk
    -0.07
     Tanks
    -0.07
     gimnasio
    -0.07
    POSITIVE LOGITS
    不像
    0.12
    Grammar
    0.10
     grammatical
    0.10
     grammar
    0.10
    Unlike
    0.10
     vowel
    0.10
    Pron
    0.10
     notoriously
    0.10
     pronunciation
    0.10
     língua
    0.09
    Act Density 0.020%

    No Known Activations