This is the latest report card of Google’s “balloon Internet” project loon. Google’s parent company alphabet officially launched the project Loon project in June 2013, which aims to combine AI technology with overpressure balloons to provide low-cost and high-speed wireless Internet services for more regions, especially remote areas far from the city center. < / P > < p > yesterday, the latest analysis showed that in 39 days of flying across the Pacific Ocean, the loon balloon showed better performance than ever before – based on the latest artificial intelligence system, it can calculate the best navigation path of the balloon faster; it takes longer time to fly in the target area, consumes less energy, and more importantly, it also proposes that the research team had never thought of before New navigation actions that have been visited. This latest AI system is based on reinforcement learning (RL) algorithm. < / P > < p > the researchers say this is the first time they have applied the RL system to aerospace products. Loon’s achievements show that RL can be used as an effective solution to solve the problem of real-world autonomous control. < / P > < p > next, let’s talk about why Google launched the “balloon Internet” program, and what problems were solved by the reinforcement learning system. In 2013, in order to make the remote areas where more than 3 billion users are located to cover the Internet, alphabet officially launched the high altitude Internet service project. In the next few years, many technology companies have joined the team, such as SpaceX, oneweb, etc. The most noteworthy one is Musk’s “space Internet” project. He plans to launch 42000 communication satellites into space, forming a giant constellation in low earth orbit to complete the communication task with the ground. So far, he has successfully launched nearly 900 satellites. In the case of “high altitude” balloon, the “high altitude” balloon is used to control the “high altitude” balloon (as follows) to “control” the “upper atmosphere” by using the “high altitude” balloon technology (as shown below) Let the balloon float up and down, capture the wind signal according to the wind direction, and stabilize the balloon in a fixed area. < / P > < p > finally, the “mesh networking” technology is used to transmit internet packets from one balloon to another; from the balloon to the home and enterprise users who set up antennas on the roof; and finally, the data of these users is transmitted out. < / P > < p > in this process, the longer the balloon flies in the stratosphere, the more long-term connectivity Loon can provide for the target area at a lower cost, which also means that Internet services will not only cover more remote areas, but also be cheaper. In recent years, Loon’s stratospheric flight duration has continuously set a world record, and the highest score has reached 312 days, nearly a whole year. < / P > < p > the highest flight record began in May 2019, when Loon took off from Puerto Rico to Peru, where he flew for three months. After the test, it crossed the Pacific South and landed in Baja, Mexico, in March this year. < / P > < p > the record set a new record of 223 days at that time. Sal Candido, chief technology officer of Loon, wrote on his blog that the record flying performance was the result of the company’s efforts to develop technology and promote the continuous upgrading of hardware and software in innovative ways. Currently, Loon has provided Loon testing services in Australia, Queensland, Kenya, New Zealand, central Canyon of California and northeast Brasilia. Last year, due to the impact of the hurricane, American telecom operators also used project Loon to provide network connection for more than 250000 victims. < / P > < p > this time, the proposed RL system provides a new solution to the current challenges. Compared with the original balloon navigation system, RL algorithm improves the decision-making time in the flight process. < / P > < p > through reinforcement learning, we can decide how to operate according to the data. AI can not only make decisions, but also make real-time decisions according to the time of movement. < / P > < p > if full network coverage is provided in an area, Loon must run at least 5 to 10 balloons at a time. If the coverage is expanded, it is necessary to call the surrounding spare balloons to form a larger mesh network in the air. < / P > < p > in this process, the balloon will generally have the following conditions: first, the life of the balloon will be shortened and it will land automatically due to factors such as battery scrapping. Secondly, the balloon is blown out of the fixed service area under the influence of severe weather such as hurricane. < / P > < p > as shown in the following figure (a) the balloon approaches its designated position by moving between winds at different heights. (b) The flight path of the balloon is shown. The diameter of the blue circle represents 50km, which is the best distance between balloons. < p > < p > but airflow is unstable. Moving through the sky by the wind is like using a road network, where the streets change direction, number of lanes and speed limits, and even disappear completely at unpredictable times. To achieve this, we need a more complex algorithm reinforcement learning. By training flight controllers, RL can form a set of control strategies to deal with high-dimensional, heterogeneous inputs and optimize long-term objectives. For example, RL has defeated the top human players many times in real-time strategic games such as dota 2, and has performed surprisingly well in long-term strategy. < / P > < p > in terms of data sets, researchers have created credible wind data sets based on the global reanalysis data (era5) of the European Center for medium range weather forecasts (ECMWF), and reinterpret the results of historical weather observations through model training of the dataset. (era5 provides a reference wind modified by program noise, which can improve the robustness of the controller modeling error by generating random seeds of high-resolution wind field to change driver noise.) < / P > < p > in terms of minimum load consumption, the average power of the deployment controller was controlled under stationseeker (the previous wind control system), and the target was coded with reward R. When the balloon distance is kept within 50 km, r = 1 is the maximum value. Of course, the reward is also related to the state of the balloon, that is, its response provides different indications (rise, fall or stay) with the change of time T < / P > < p > finally, the calculation cost is mainly reflected in the wind measurement. The researchers use Gaussian process to combine the balloon measurement results with the ECMWF forecast results, and use the wind forecast as the prior average value. The variance of posterior distribution quantifies the uncertainty of different wind estimates. As the input of the controller, the wind size and relative orientation directly above and below the balloon are coded, ranging from 5 kPa to 14 kPa at 181 pressure levels. From December 17, 2019 to January 25, 2020, Loon flew for about 2884 hours. The data were divided into 851 three hour periods, each time period as an independent sample. The final test results show that, < / P > < p > compared with stationseeker, the RL controller can spend more time in the range of 25-50km by using different strategies according to the wind conditions in the range of 50 km compared with stationseeker (Fig. 4b); the offset time is shortened by active movement to return to the target area (Fig. 4C). At the same time, it also saves more energy (Fig. d). Finally, the RL controller uses altitude to convert excess solar energy into potential energy (Fig. 4e). These results show that reinforcement learning is an effective solution to solve the problem of autonomous control in the real world. When the traditional control method (stationseeker) can not meet the requirements, it is necessary to create an artificial agent that can continuously interact with the real dynamic environment. The report shows that the number of app store purchases soared in the first half of this year due to the impact of covid-19