INDEX
    Explanations

    numbers and numerical factors

    New Auto-Interp
    Negative Logits
    "],
    
    -1.06
    ']")
    -1.02
    "}},
    -0.99
    "]);
    
    -0.98
     autorytatywna
    -0.98
    "):
    
    -0.98
    "])
    
    -0.98
    }")
    
    -0.97
     ―――――
    -0.96
    )"),
    -0.95
    POSITIVE LOGITS
    1
    2.00
    2
    1.11
    0
    1.06
    3
    1.00
    5
    0.95
    6
    0.87
    4
    0.86
    9
    0.85
    7
    0.80
    0.77
    Act Density 1.913%

    No Known Activations