Introduction

Introduced in the late 1990s, genetically engineered (GE) crops sought to revolutionize the world of agriculture through their anticipated benefits, which mainly focused on improving crop survivability through genes conferring resistances or tolerances towards environmental factors, but in practice, does the implementation of GE crops increase agricultural productivity? In this document, we seek to answer this very question by looking at the progression yields and chemical applications regarding corn (maize) in the face of GE adoption.

To properly answer the research question, and to get as close as possible to determining a causal relationship, we utilized two different approaches, a simple OLS approach, and a differences in differences (DiD) approach. For the OLS approach, data specific to the United States (US) was utilized, and it consisted of variables (detailed explanation in next section) observed annually by state. For the DiD approach, we relied global data observing countries and their respective corn yields, annually, which allowed us to model a DiD based on GE regulatory approaches.

As for our initial expectations, we did expect to see that GE corn has a positive causal relationship on both yield and application efficiency, but we were unsure what the exact effect would be. For example, when it came to yields, we thought that because GE crops are not designed to directly increase yields, their effect on them would be minimal, but we did not hold this same thought for application efficiency. Specifically, we expected to see that GE presence has a strong positive effect on application efficiency due to the fact that many of the genes were designed to limit the usage of chemical applications, such as natural insect resistant genes. With these expectations in mind, let’s look at what the data tells us, but first, we will explain the data.

Data Description

OLS Data Frame Main Variables & Sources

ge_pct - The “ge_pct” variable is obtained from the United States Department of Agriculture’s (USDA) Economic Research Service (ERS) and represents the percent of corn planted containing genetic modifications annually by state from 2000-2019. For the purposes of the OLS, this is our main explanatory variable

acres_planted - The “acres_planted” variable is obtained from the USDA’s National Agriculture Statistics Survey (NASS) and represents the amount of acres used for corn annually by state from 2000-2019.

corn_bu - The “corn_bu” variable is obtained from the USDA’s NASS and represents the total amount of corn grain bushels produced by each state annually, from 2000-2019.

app_kg - The “app_kg” variable is obtained from the United States Geological Survey (USGS) and represents the amount of chemical applications in kg used for corn production by state, annually from 2000-2019.

bu_peracre - The “bu_peracre” is defined manually by dividing “acres_planted” by “corn_bu”, and this variable, for the purposes of the OLS analyses, can be considered the “yield” in bushels per acre, and is the main explained variable. This variable occurs by state, annually from 2000-2019.

app_peracre - The “app_peracre” is defined manually by dividing “acres_planted” by “app_kg”, and standardizes chemical applications based on how much area it is used on. This variable occurs by state, annually from 2000-2019.

appeff - The “appeff” is defined manually by dividing “bu_peracre” by “app_peracre” and denotes the efficiency of chemcical applicaitons applied to corn, where higher values indicate more efficiency than lower values. This variable occurs by state, annually from 2000-2019.

DiD Data Frame Main Variables & Sources

All of the DiD data comes from one source, and that is the Food and Agriculture Organization (FAO) of the United Nations (UN), and unlike the previous data frame, this one is very simple. Instead of having multiple observed variables, for this data frame, we are looking at three main variables, country, year, and yield (this time yield is represented in terms of kg/ha). By looking at these three variables and understanding whether or not countries heavily regulate/ban or permit the cultivation of GE corn, a DiD analysis should be feasible. As for the frequency of the data itself, observations occur annually and by country. Now, with this being said, let’s look at the first models.

Models and Interpreations

Yield OLS Models

For the first OLS regressions, we used ge_pct and app_peracre to explain corn yields in two different specifications, one where we do not account for state fixed effects, and one where we do. Below is a summary of the two regressions:

## 
## ===============================================================
##                                 Dependent variable:            
##                     -------------------------------------------
##                                   log(bu_peracre)              
##                               OLS                  felm        
##                        No Fixed Effects     State Fixed Effects
##                               (1)                   (2)        
## ---------------------------------------------------------------
## ge_pct                     0.002***              0.002***      
##                             (0.001)              (0.0004)      
##                                                                
## app_peracre                 0.090**              0.147***      
##                             (0.043)               (0.053)      
##                                                                
## Constant                   4.646***                            
##                             (0.072)                            
##                                                                
## ---------------------------------------------------------------
## Observations                  220                   220        
## R2                           0.093                 0.578       
## Adjusted R2                  0.084                 0.554       
## Residual Std. Error    0.201 (df = 217)      0.140 (df = 207)  
## F Statistic         11.070*** (df = 2; 217)                    
## ===============================================================
## Note:                               *p<0.1; **p<0.05; ***p<0.01

Yield OLS Model Interpretations

The summary of both these regressions indicate that if all the corn in society was genetically engineered, we would see a ~20% gain in corn productivity, at least in terms of yield, and it does so while indicating extreme statistical significance. Furthermore, the regressions also indicate that for every additional kilogram of chemicals applied to an acre of corn we can expect to see a 8.9%, or 14.7% increase in yields depending on the specification, which might seem high, but the average value of app_peracre was only 1.44kg. Finally, before before moving on to our next analysis, we think it is worth noting that this model likely contains omitted variable bias as it only takes two variables into account, which means it likely leaves out variables that are correlated to both yield and ge_pct, but even still, we think the 20% figure is important to note.

Yield DiD Models

To conduct an accurate DiD analysis, and to satisfy the parallel trends requirement, we first had to find sets two countries of countries with similar yield trends before GE corn adoption. To do this, we modeled a whole host of GE corn producing and non-GE corn producing countries, and selected two sets of countries on their corn yield trends. The first set of countries chosen consisted of Spain and Italy, and the second set of countries chosen consisted of Argentina and France, and their respective trends pre any GE corn adoption can be seen below:

Once we noted these two sets countries we had to create new variables according to each set, a TREAT variable, and a POST variable. The TREAT variable was assigned a value of TRUE to Spain and Argentina, and a value of FALSE to Italy and France. Additionally, the POST variable was assigned a value of TRUE if the year was after 1998 in the Spain/Italy set, and a value of TRUE if the year was after 1996 in the Argentina/France set. Using these newly created variables, regressions were run using the following formula:

YIELD = TREAT + POST + TREAT \times POST

$\text{YIELD} = \text{TREAT} + \text{POST} + \text{TREAT} \times \text{POST}$ And using this formula, we obtained the following results:

## 
## ==================================================================
##                                  Dependent variable:              
##                     ----------------------------------------------
##                                       log(yield)                  
##                       Spain/Italy Model    Argentina/France Model 
##                              (1)                     (2)          
## ------------------------------------------------------------------
## TREAT                     -0.177***               -0.641***       
##                            (0.040)                 (0.047)        
##                                                                   
## POST                       0.159***               0.254***        
##                            (0.042)                 (0.048)        
##                                                                   
## TREATTRUE:POST             0.272***               0.280***        
##                            (0.060)                 (0.068)        
##                                                                   
## Constant                   8.964***               8.829***        
##                            (0.029)                 (0.033)        
##                                                                   
## ------------------------------------------------------------------
## Observations                  70                     64           
## R2                          0.649                   0.863         
## Adjusted R2                 0.633                   0.856         
## Residual Std. Error    0.124 (df = 66)         0.136 (df = 60)    
## F Statistic         40.639*** (df = 3; 66) 125.647*** (df = 3; 60)
## ==================================================================
## Note:                                  *p<0.1; **p<0.05; ***p<0.01

Yield Did Model Interpretations

While these DiD models indicates many things, for the purposes of this project the most important indication is that it estimates GE adoption to be responsible for a ~27-28% increase in corn yields, and it does so while indicating extreme significance in both models. While these model may be more indicative of a causal relationship than the previous OLS model, we think it is worth noting that limitations still may exist. Specifically, while extremely similar, the pre-GE adoption yield trends in both sets are not perfectly parallel, which could influence the model, and overstate or understate the effects of GE adoption on corn yields.

Application Efficiency OLS Models

For the application efficiency OLS analysis, we defined a new variable (from the first OLS variables) called appeff, which is a coefficient determined by dividing yield per acre by applications per acre, and thus, a higher value, means higher degree of application efficiency. When then plotted the progression of the application efficiency over time, and this is what we saw:

The graphs indicate that initially, society was becoming more productive in terms of applications, but around the year of ~2006 we start to see a decline, and we wanted to determine why. For the application efficiency value to decrease, there would have to be a decrease in yields, or a increase in chemical application use, and because we know that yields are not decreasing, we decided to look at chemical applications over time. With this being said, below is a graph that looks at corn chemical use over time for the state of Iowa, taking only the most used chemicals into account:

The graph above seems to explain why we saw corn yields fall after initially increasing. Specifically, as GE crops were adopted the use of two of the most prominent chemicals decreased rapidly, but this effect did not last, as the use of other chemicals (specifically glyphosate) rose. After determining why we saw a decrease in the application efficiency coefficient, we wanted to run an OLS regression on on the application efficiency coefficient taking the ge_pct variable into account with and without state fixed effects, and below are the results:

## 
## ===================================================================
##                                   Dependent variable:              
##                     -----------------------------------------------
##                                       log(appeff)                  
##                              OLS                     felm          
##                     Without Fixed Effects  With State Fixed Effects
##                              (1)                     (2)           
## -------------------------------------------------------------------
## ge_pct                     0.002***                0.001***        
##                            (0.001)                 (0.0004)        
##                                                                    
## Constant                   4.435***                                
##                            (0.051)                                 
##                                                                    
## -------------------------------------------------------------------
## Observations                 220                     220           
## R2                          0.038                   0.659          
## Adjusted R2                 0.034                   0.641          
## Residual Std. Error    0.268 (df = 218)        0.163 (df = 208)    
## F Statistic         8.672*** (df = 1; 218)                         
## ===================================================================
## Note:                                   *p<0.1; **p<0.05; ***p<0.01

Application Efficiency OLS Model Interpretations

These results indicate that while might have seen an initial decrease in the application efficiency score generally speaking, the model estimates that there will be a .2% increase in the application efficiency score for every 1% gain in the share of crops planted that are genetically engineered. Essentially, this model estimates that GE crops do increase application efficiency.

Conclusion

With the analyses run within this project, even with the limitations, we feel as if we either proved, or are very close to proving a causal relationship between increases in corn yields and GE corn adoption, but actually quantifying this relationship will be very tricky. With the results we have so far, we would estimate that if a true experiment were run, GE crops would see a yield ~20% higher than non-GE crops, and this is due to the OLS and DiD figures both being around 20%. Moving forward, and to continue this project, we plan on running even more analyses to explore the figures even more. Specifically, moving forward, we would like to run another DiD analysis looking at a different set of countries.

Causal Analysis of GE Effects on Agricultural Productivity

Davis, P., & Ismael, M., 2024

2024-12-04