INDEX
Explanations
concept followed by explanation
New Auto-Interp
Negative Logits
...')
1.16
")
1.08
इत्यादि
1.04
...")
1.02
]`
0.99
)"
0.98
)#
0.97
などは
0.96
)")
0.94
...)
0.93
POSITIVE LOGITS
—
1.23
—
1.14
——
1.04
–
1.01
–
0.96
––
0.95
:
0.95
—“
0.94
€”
0.92
—.
0.89
Activations Density 0.375%