Koko loves to eat bananas. There are N piles of bananas, the ith pile has piles[i] bananas. The guards have gone and will come back in H hours.
Koko can decide her bananasperhour eating speed of K. Each hour, she chooses some pile of bananas, and eats K bananas from that pile. If the pile has less than K bananas, she eats all of them instead, and won’t eat any more bananas during this hour. Koko likes to eat slowly, but still wants to finish eating all the bananas before the guards come back.
Return the minimum integer K such that she can eat all the bananas within H hours.
1  Input: piles = [3,6,7,11], H = 8 
1  Input: piles = [30,11,23,4,20], H = 5 
1  Input: piles = [30,11,23,4,20], H = 6 
The way to tackle this question is to find out a way to get the minimum possible K. The value for K is clearly limited from 1 to the greatest number in the piles array (if setting K to maximum number in the array then any number greater than K will only result in longer time to finish)
Of course we could iterate from 1 to the greatest number to find out K, then why don’t we use binary search? The way it works is that if we found some K that doesn’t allow Koko to have all the bananas in limited time H, then any number smaller than K will not work either. Same thing goes to the case that we found some K that does allow Koko to enjoy all bananas within time H, we can then safely try smaller numbers as they might also fulfill the term.
Now that we have a nice way to find K, the only thing left here is to check if K is good (Koko get to eat all bananas within H hours) or not (Koko get caught by the guard). We can simply go through the array to calculate the time needed according the the description.
1  class Solution { 
Suppose n is the length of the piles array, the method finish takes O(n) as it iterates over the array once. In minEatingSpeed finish is called inside the binary search loop. Therefore, the time complexity is O(nlog(K)) and space complexity is O(1).
There are still something we could improve in the finish method. Assuming n is smaller than K, then we could first sort the piles array in minEatingSpeed, then use binary seach as well in finish to find out the minimum p larger than K, in this way there is no need to iterate the elements before p.
We want to know if using FPGA/OpenCL and GPU/CUDA is a possible solution to accelerate blockchain applications (specifically the SHA256 hashing algorithm during the bitcoin mining process) and we implemented the algorithm in different ways and compare the costbenefit/performance with them. To do so we want to examine the hash algorithm with files and try to parallelize the implementation for acceleration.
In the process of bitcoin mining, it is important to have a cryptography or hash algorithm to help to keep the Bitcoin’s blockchain secure. The hash algorithm is considered as the building blocks of the cryptocurrency industry in the present.
With Bitcoin mining, the hardware is doing the SHA256 Cryptographic Hash Algorithm twice on the Proof of work. For every try of a new hash value, the software will use a different number as the nonce until it hits the correct value that lower the target bits. Thus, SHA256 Cryptographic Hash Algorithm is one of the most significant elements for Bitcoin mining speed.
SHA256 is one of the most popular hashing algorithms in SHA2 which is a set of cryptographic hash functions designed by the United States Security Agency. Simply, for a message with an arbitrary length, SHA256 will generate a unique 256 bits hash, which is called the message digest. In most cases, it is represented by a hex string of length 64.
SHA256 algorithm can be divided into three modules: constant value initialization, message preprocessing, and message digest calculation.
For the constant value initialization module, there are eight initial hash values (Figure 1) and 64 hash constants (Figure 2). These eight initial hash values all come from the decimal part of the square root of the first eight prime numbers. Similarly, these hash constants come from the decimal part of the cube root of the first 64 prime numbers. These values will be used in the message digest calculation part.
The purpose of the preprocessing module is to meet the required input structure so that the message digest calculation module can deal with the input properly. The first step is to fill in bits (The first filling bit is 1, and the rest is all 0s) until the modulo of the input length equals to 448. The second step is to add length information following the filling bits, which is 64 bits. The length of the input will be a multiple of 512 (448 + 64)
after passing the preprocessing module.
In the message digest calculation module, the first step is to break the input message from the preprocessing module into 512bit blocks. The initial value of the message digest is H0. Then, passing H0 into the calculation of block 1, it generates H1, and so on. After going through all blocks, it will generate a Hn, which is the final message digest output.
In our previous plan, the optimization will focus on the message digest calculation model (Function void transform(const unsigned char *, unsigned in)). For the other two modules, since the operations are almost serial and low complexity, there will not be obvious speedup when we use FPGA or GPU.
As we learned about the differences in the architecture between CPU and other hardware, it’s possible to use FPGA (Field Programmable Gate Array) and GPU to accelerate SHA256.
Even though there are parallel circuits on the framework of CPU, one the software level, it can only return serial execution because of the Operation Systems and other highlevel languages.
FPGA is a semiconductor device comprised of an array of “Look up tables”  LUT. A LUT can be viewed as a signal calculation unit or a simple logic gates which accepts a few inputs, calculates the output and stores the output in SRAM memory cells. There are routing channels that run between logic blocks, used to connect LUTs together. So, FPGA can work as a combination of logic gates. As a result, FPGA can execute multiple inputs and multiple outputs at the same time without interference.
GPU, Graphics Processing Unit, is originally for displaying images. When it processes graphics, there are a large number of calculations needed to be done at the same time during different stages. For example, in the rendering stage, GPU needs to calculate for Vertex Shader, Geometry Shader, Pixel Shader so that it can generate the most accurate output. GPU is comprised of many ALUs to process multiple pixels simultaneously. Each of the ALUs can be viewed as a “thread”. Also, the bandwidth of GPU is high to support ALUs. Even though the ALU is not very powerful, the normal add, multiply and divide are supported. Similar to the FPGA, GPU is also made for parallel calculation from the hardware level.
This character of FPGA and GPU can be helpful for SHA256 Algorithm. It will be a great improvement to replace parts of serial calculation in the SHA256 Algorithm by parallel calculation with hardware support.
We will use the GPGPU application for both devices. The GPGPU contains a host program and a kernel. The host program is a C++ CPU program use to deal with inputs, outputs and to call kernel APIs. The kernel is a program running on the devices (FPGA or GPU). For GPU, we will use CUDA. For FPGA, we will use OpenCL.
After the careful examination on the sha256 algorithm, we find out that parallelizing the execution of a single sha256 hash algorithm cannot be completed because of the way the SHA transform function is implemented. This section is the incorrect CUDA version of SHA256. The OpenCL version is not provided since it’ll also be a parallel algorithm that SHA256 does not support.
The transform operates in the transform function on blocks of fixed size but each new transformation requires the output of the last transformation so there is no way to run the computations in parallel. As the following code shows, the SHA_UNPACK32 function and SHA_PACK32 function in the header of the cpp version algorithm requires the value from the last output. If we use CUDA or FPGA to do this step in parallel, the str output will not be correct.
In our code, we tried to rearrange the for loop in the transform function so that we can use as less kind of kernels as possible. We put the for loop used to store data into each Word before the encryption for loop and put them into kernel 1 as the code shown below.
1  __global__ void kernel1(unsigned char* sub_block_d, unsigned int *w_d, unsigned int *wv, unsigned int *m_h) { 
In kernel2, we put the for loop used to encrypt the Word n times. In fact, this part has to be run in order.
1 

The image above shows one iteration of sha256 algorithm transformation. The original array is sent to different operations and then become the input for the next transformation. The four blue components stand for the computation:
The third kernel was used to combine the eight Word into a single hash for the block i.
1  __global__ void kernel3(unsigned int *wv, unsigned int *m_h) { 
Sha256 in c++ version:
1  sha256('apple'):3a7bd3e2360a3d29eea436fcfb7e44c735d117c42d1c1835420b6b9942dd4f1b 
Sha256 in cuda version:
1  sha256('apple'):e9f9b813257194ecf7d6a1f7e1bee8ac6dbd7e80ec13bb0bba8942377b64a6c4 
Possible Solution
Since the sha256 hash algorithm cannot be accelerated by using parallel computing, there is no way we can make the algorithm faster with one single input string. However, we think that the algorithm could still be run in parallel if there are multiple input strings and the speed up can still be tested. Also, for a long string, we will be able to divide it into multiple substrings and run the parallel hash algorithm on them separately. In this way, it will allow us to hash different parts of the string and combine them back to a long string. However, it is uncertain what effect it could cause by putting all of the hash functions in a single kernel, considering that there are already different transformation and update functions. Because it takes time to copy and pass the required data arrays into the kernel, running everything in multiple kernels could possibly slow down the process of hashing. It requires further research and implementation.
There is always a start.
——————————by someone smart
The famous “Hello World” program, almost the first lesson for every programming learner in basically every language, originated from the C Programming Language book’s predecessor: A Tutorial Introduction to the Programming Language B published in 1973:
1 

The author, Brian Kernighan, states in the interview that
What I do remember is that I had seen a cartoon that showed an egg and a chick and the chick was saying, “Hello, World.”
He himself might not even realize that it was this simple sentence that would lead millions people’s to the path of a software engineer, a programmer (or 26,4 million specifically in Daxx’s research, of course I wonder the count for “real pros”) Neither do many realize how this could be a great highlight in the history. Some of them wouldn’t even care about “Hello World”, they might just print it and continue to the next task in tutorial (probably loop or something). With no doubt, it is the “Hello World” that calls programmers to take this very popular position and causing bald issues.
Why are we valuing such a simple sentence? It seems that for everything we are hoping to find where it came from, even though it matters little. “Hello World” only exist in a demo or introduction in the book, you probably won’t find it in a slightly larger program, even the example itself is defective since it lacks a return type and sentence, yet we are still thrilled to learn how everything started.
Why?
I have no answer here, but as I said, there is always a start. We look for start not because how much it can be helpful, but for the experience we had, which defines the personality and identity of ourselves. We seeks to know the origin as we are hoping during this long journey we are still holding to our intentions from the beginning.
I am hoping I could follow this, which is why I am writing down this very first blog. I want it to be a remainder whenever I looked back at my footsteps, hoping I am still on the trail.
]]>