INDEX
Explanations
terminology and metrics related to circuits and their performance
New Auto-Interp
Negative Logits
LEncoder
-0.62
quels
-0.54
[…]
-0.48
Демографія
-0.45
MDB
-0.45
конец
-0.44
pédie
-0.44
cutt
-0.44
Many
-0.43
long
-0.43
POSITIVE LOGITS
averaging
1.12
averaged
1.06
totaling
1.05
averages
1.04
totalling
1.04
totalled
0.96
totaled
0.95
amounting
0.94
average
0.91
estimated
0.88
Activations Density 0.921%