{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "08b6abf2-245e-4217-86fb-536e3d6b0241",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "**Assignment: Financial Research Control Variables Using Python**\n",
    "\n",
    "**Objective:**\n",
    "In this assignment, you will compute a set of commonly used control variables in financial research using Python. The original data comes from the CSMAR (China Stock Market & Accounting Research) database. You are expected to implement the necessary data processing, calculations, and summarize your findings effectively.\n",
    "\n",
    "---\n",
    "\n",
    "### **Instructions:**\n",
    "\n",
    "1. **Download and Load Data**\n",
    "\n",
    "   * Obtain the raw data from the CSMAR database. The data should contain financial information for multiple companies across various years(2003-2024). For the sake of this assignment, use data that includes key variables such as **Revenue**, **Total Assets**, **Equity**, **Operating Profit**, **Cash Flow**, and others relevant for financial analysis.\n",
    "\n",
    "2. **Control Variables Calculation**\n",
    "   For each company-year observation in the dataset, calculate the following control variables, which are commonly used in financial research:\n",
    "\n",
    "   * **Size**: The natural logarithm of Total Assets.\n",
    "\n",
    "     ```python\n",
    "     df['Size'] = np.log(df['Total_Assets'])\n",
    "     ```\n",
    "   * **Leverage (Lev)**: Total Debt / Total Assets.\n",
    "\n",
    "     ```python\n",
    "     df['Lev'] = df['Total_Debt'] / df['Total_Assets']\n",
    "     ```\n",
    "   * **Profitability (ROA)**: Return on Assets = Net Income / Total Assets.\n",
    "\n",
    "     ```python\n",
    "     df['ROA'] = df['Net_Income'] / df['Total_Assets']\n",
    "     ```\n",
    "   * **Growth**: The annual percentage change in Revenue.\n",
    "\n",
    "     ```python\n",
    "     df['Growth'] = df['Revenue'].pct_change() * 100\n",
    "     ```\n",
    "   * **Market-to-Book Ratio (MBR)**: Market Value of Equity / Book Value of Equity.\n",
    "\n",
    "     ```python\n",
    "     df['MBR'] = df['Market_Capitalization'] / df['Total_Equity']\n",
    "     ```\n",
    "   * **Cash Flow**: Operating Cash Flow / Total Assets.\n",
    "\n",
    "     ```python\n",
    "     df['CashFlow'] = df['Operating_Cash_Flow'] / df['Total_Assets']\n",
    "     ```\n",
    "\n",
    "3. **Additional Control Variables**\n",
    "   Add the following control variables that are also commonly used in financial research:\n",
    "\n",
    "   * **State-Owned Enterprise (SOE)**: A binary variable indicating whether the company is state-owned. You can assume that a variable `SOE` is already provided in the dataset, with `1` for state-owned enterprises and `0` for others.\n",
    "\n",
    "     ```python\n",
    "     df['SOE'] = df['SOE']  # Use existing data, or create if necessary.\n",
    "     ```\n",
    "   * **Dual Role (DUAL)**: A binary variable indicating whether the CEO also serves as the Chairman of the Board. You may need to create this variable based on the dataset.\n",
    "\n",
    "     ```python\n",
    "     df['DUAL'] = df['CEO_Chairman']  # Create if applicable.\n",
    "     ```\n",
    "   * **First Shareholder Ownership Ratio (H1)**: The percentage of shares held by the largest shareholder.\n",
    "\n",
    "     ```python\n",
    "     df['H1'] = df['H1_Shareholding']\n",
    "     ```\n",
    "   * **Management Shareholding Ratio (MSH)**: The percentage of shares held by the management team.\n",
    "\n",
    "     ```python\n",
    "     df['MSH'] = df['Management_Shareholding']\n",
    "     ```\n",
    "   * **Institutional Investor Shareholding Ratio (INSTSH)**: The percentage of shares held by institutional investors.\n",
    "\n",
    "     ```python\n",
    "     df['INSTSH'] = df['Institutional_Shareholding']\n",
    "     ```\n",
    "   * **Independent Directors Ratio (INDP)**: The ratio of independent directors on the board.\n",
    "\n",
    "     ```python\n",
    "     df['INDP'] = df['Independent_Directors'] / df['Board_Size']\n",
    "     ```\n",
    "   * **Board Size (BSIZE)**: The total number of board members.\n",
    "\n",
    "     ```python\n",
    "     df['BSIZE'] = df['Board_Size']\n",
    "     ```\n",
    "\n",
    "4. **Data Processing**\n",
    "\n",
    "   * Ensure that the dataset has no missing values in the required columns. Use appropriate methods to handle missing values (e.g., imputation, deletion, etc.).\n",
    "   * Convert variables to appropriate data types if necessary (e.g., converting year columns to integers).\n",
    "\n",
    "5. **Create New Columns**\n",
    "   Create new columns for each calculated control variable. Make sure to use meaningful names and ensure no overwriting of the original columns unless explicitly required.\n",
    "\n",
    "6. **Data Summary**\n",
    "   After processing the data and calculating the control variables, provide a summary of the dataset:\n",
    "\n",
    "   * Number of companies and years represented.\n",
    "   * A brief descriptive analysis of the control variables (mean, median, standard deviation).\n",
    "\n",
    "   Example:\n",
    "\n",
    "   ```python\n",
    "   df[['Size', 'Lev', 'ROA', 'Growth', 'MBR', 'CashFlow', 'SOE', 'DUAL', 'H1', 'MSH', 'INSTSH', 'INDP', 'BSIZE']].describe()\n",
    "   ```\n",
    "\n",
    "7. **Save Processed Data**\n",
    "   Save your processed dataset to a new file (e.g., `processed_data.csv` or `processed_data.dta`) and ensure the control variables are correctly included.\n",
    "\n",
    "---\n",
    "\n",
    "### **Deliverables:**\n",
    "\n",
    "1. Python code implementing all the steps above.\n",
    "2. A summary report that includes:\n",
    "\n",
    "   * A description of the dataset and the steps taken to clean and process the data.\n",
    "   * A table showing the descriptive statistics for the control variables.\n",
    "   * Any additional observations or issues encountered during the analysis.\n",
    "\n",
    "---\n",
    "\n",
    "**Submission Deadline:** [2026-04-12]\n",
    "\n",
    "**Submission Format:** Please submit a Python script (`.py`) and a report in a PDF format.\n",
    "\n",
    "Good luck, and make sure to follow the instructions carefully!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d5ab9a9-1454-4f89-8784-f0990452e6fe",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.14.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
