Hydraulic fracturing is an indispensable procedure to the economic development of shale gas. The flowback of the hydraulic fracturing fluid is one of the most important parameters recorded after shale gas wells are put into production. Generally, the flowback ratio is used as the flowback indicator during hydraulic fracturing. The flowback ratio has a great influence on shale gas production. However, the flowback ratio is subjected to various affecting factors with their correlativity unclear. Based on a large amount of original geological, engineering, and dynamic data acquired from 373 hydraulically fractured horizontal wells, the flowback characteristics were systematically studied based on machine learning. Based on the data analysis and random forest forecasting, a new indicator, single-cluster flowback ratio, was proposed, which can more effectively reflect the inherent relationship between flowback fluid volume and influencing factors. The results of training random forests for big data show that this indicator has better learnability and predictability. A good linear relationship exists between single-cluster flowback ratios in different production stages. Accordingly, the 30-day single-cluster flowback ratio can be used to predict the 90-day and 180-day single-cluster flowback ratios. The main controlling factors of production and flowback ratio were also systematically analyzed. It is found that the main controlling factors of the flowback ratio include the number of fracturing clusters, the total amount of sand, number of fracturing stages, and fluid injection intensity per cluster. This study can provide a fundamental reference for analyzing the hydraulically fracturing fluid flowback for shale gas reservoirs.