1. Installing Python#

As you embark on your journey into data analytics in accounting, one of the first technical decisions you’ll face is how to install Python. There are two widely adopted alternatives:

  1. Direct Installation from Python’s Official Website: You can download the installation files directly from Python’s official website: www.python.org

  2. Using the Anaconda Environment: Another option is to use Anaconda, which you can download from Anaconda’s official website: www.anaconda.com

Both of these methods have distinct advantages and limitations, which I will elaborate on below to help you make an informed decision.

Direct Python Installation

Pros:

Up-t-Date Libraries: With a direct installation, you often get the most recent versions of Python libraries. You can manage these libraries easily using pip, Python’s package installer.

Lean Setup: A direct installation includes only the essential features, making it a lighter package that uses less disk space.

Full Control: You have more direct control over your Python environment, including which libraries to install.

Cons:

Initial Complexity: New users might find it challenging to manage different libraries and dependencies manually.

Potential for Conflicts: Without a managed environment, there’s a greater onus on the user to resolve any library conflicts.

Anaconda Environment

Pros:

Comprehensive: Anaconda provides a robust and integrated environment that comes pre-packaged with a wide array of libraries and tools geared toward data science.

Ease of Use: It offers a more user-friendly approach to package management and environment management, making it a good choice for beginners.

Versatility: Anaconda allows you to create multiple environments, which can be beneficial if you’re working on different projects that require different library versions.

Cons:

Disk Space: Anaconda comes with many features and libraries that you may not need, consuming more disk space.

Slightly Outdated Libraries: The libraries in Anaconda can sometimes lag behind the most current versions available through pip.

Personal Experience and Recommendation

In my own journey, I started with Anaconda to take advantage of its user-friendly environment and extensive suite of pre-installed libraries. However, as my needs became more specialized and disk space more precious, I transitioned to a “pure” Python setup to benefit from the latest library versions and more direct control over my environment.

Your choice may depend on several factors, including your level of familiarity with Python, specific project requirements, and computer limitations. Both approaches are viable, but they cater to different needs and skill levels.

1.1. Anaconda#

snake

Anaconda is a highly recommended environment for those embarking on their journey through this book and, more broadly, in data analytics in accounting. Anaconda provides an integrated, pre-packaged set of libraries and tools designed to facilitate scientific computing and data science. To learn more or initiate the installation process, please visit www.anaconda.com. For the exercises and case studies in this book, you will require numerous Python libraries that extend beyond the core installation. While you could certainly install these libraries manually in a “pure” Python environment, compatibility issues could potentially disrupt your learning process. Anaconda alleviates this concern by ensuring that all provided Python libraries are compatible with each other, and all are very easy to install. For example, installing the GPU-enabled version of TensorFlow is a straightforward process that can be accomplished with just a few mouse clicks. In pure Python, it is extremely difficult, because you need to manually install the exact correct versions of the NVIDIA GPU libraries. Anaconda is also very popular and functions identically across all major operating systems, offering a level of consistency that simplifies cross-platform development. Furthermore, the list of libraries is quite extensive, and it is rare that a Python library is missing from the Conda environment (although this can happen). Given Anaconda’s ease of use, its stress on compatibility, and its extensive library support, it serves as an excellent starting point for Python learners, especially those interested in data analytics in accounting. By following these guidelines, you can ensure a stable and productive learning environment.

1.2. Installing Anaconda#

To install Anaconda, you just need to download the binaries from www.anaconda.com/products/individual and follow the instructions. Follow the installation instructions specific to your operating system. During this process, you’ll be prompted to set Anaconda as your default Python environment. For the purposes of this book and for those primarily learning Python, it is strongly recommended to accept this option. Note: If you plan to use both Anaconda and a “pure” Python environment, then carefully consider whether you want to set Anaconda as your default, as managing multiple Python installations can become complicated.

1.3. Updating Anaconda#

Anaconda utilizes conda (conda.io) as its package and environment manager. While it’s entirely possible to manage your Anaconda environment via the graphical interface, it’s advantageous to become familiar with basic conda commands. One such command that should be executed regularly is: conda update anaconda. This command ensures that your Anaconda environment and the libraries therein are up-to-date.

For more information on Conda, go to docs.conda.io.

1.4. Installing pure Python#

python

While the previous sections have emphasized the benefits of using Anaconda, there are indeed valid reasons for opting for a ‘pure’ Python installation. This approach offers you more direct control over your programming environment and is often favored for specialized development needs. However, be aware that this method may require more hands-on management, especially when dealing with library dependencies.

Installing Python on Windows

For those using a Windows operating system, Python does not come pre-installed, so you’ll need to download it manually. Below are the general steps, but for a more detailed guide, please refer to Python’s official documentation on: docs.python.org/3/using/windows.html#installation-steps

  1. Download: Visit Python’s official website and download the installer for the latest Python version.

  2. Run Installer: Double-click the downloaded file to start the installation process.

  3. Customization: You’ll be presented with various options. It’s generally safe to stick with the defaults, but make sure to check the box that says “Add Python to PATH” to ensure easy command-line access.

  4. Install: Click on the ‘Install Now’ button. Once the process is complete, you can verify the installation by opening a Command Prompt and typing python –version.

Installing Python on Linux

Python usually comes preinstalled on most Linux distributions. If not, it’s available as a package that you can easily install. For specific instructions tailored to your Linux distribution, consult Python’s official Linux installation guide: docs.python.org/3/using/unix.html#on-linux

  1. Check for Pre-installation: Open a terminal and type *python –version or python3 –version to see if Python is already installed.

  2. Package Manager: If Python is not installed, you can usually install it using your distribution’s package manager. For Debian-based distributions like Ubuntu, this can often be done with sudo apt-get install python3.

  3. Verify Installation: Once the installation is complete, open a new terminal window and type python3 –version to confirm that Python has been successfully installed.

A Note on Managing Multiple Python Environments If you’ve opted to install both Anaconda and pure Python, managing your PATH variables becomes important. Be cautious when setting either as your default environment, as having multiple Python installations can sometimes lead to complexities.

Final remarks Whether you choose Anaconda or a ‘pure’ Python installation is contingent on your specific needs, skill level, and the projects you’ll be working on. Each has its advantages and disadvantages, and your choice will set the foundation for your Python programming journey in the realm of accounting data analytics.

1.5. Environments#

Regardless of whether you opt for Anaconda or a ‘pure’ Python installation, mastering the concept of Python environments is important for several reasons:

  1. Isolation: Environments allow you to work in isolated spaces, meaning you can tailor each environment with only the libraries needed for a specific project or task. This minimizes the risk of version conflicts between libraries.

  2. Reproducibility: Utilizing environments makes it easier to share your work and collaborate with others by ensuring that everyone has the same setup, thus minimizing the infamous “it works on my machine” problem.

  3. Resource Management: By limiting each environment to only the libraries it requires, you conserve disk space and computational resources, resulting in faster program execution.

Managing enviroments in Python

For those working with ‘pure’ Python, the standard tool for managing environments is venv. It provides an isolated Python environment in which you can install libraries independently of the system Python. Follow the instructions in Python’s official documentation to learn how to create, activate, and manage environments: docs.python.org/3/tutorial/venv.html

Key Commands:

  • Creating an Environment: python3 -m venv myenv

  • Activating an Environment:

    • On Windows: myenv\Scripts\Activate

    • On Unix/MacOS: source myenv/bin/activate

Managing environments in Anaconda

Anaconda offers its own environment management through its conda tool. These environments can be easily managed through the Anaconda Navigator, a graphical interface that comes with the Anaconda distribution. The official Anaconda documentation provides a comprehensive tutorial on creating and managing environments: docs.anaconda.com/anaconda/navigator/tutorials/manage-environments/

For those who prefer using the terminal, you can accomplish the same tasks with conda commands. The official guide provides an exhaustive list of commands and options: docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

Key Commands:

  • Creating an Environment: conda create –name myenv

  • Activating an Environment: conda activate myenv

Final thoughts Understanding and effectively using Python environments will not only make your development process more efficient but also minimize potential roadblocks related to library dependencies. Whether you are an amateur coder or an experienced developer, proper environment management is key to successful, reproducible, and scalable data analytics.

1.6. Jupyter notebook#

Jupyter Notebooks serve as one of the most dynamic environments for interactive computing and data analysis. Developed as a part of Project Jupyter, these notebooks offer a browser-based interface to Python, among other programming languages. Their capabilities extend far beyond mere code execution; they provide a multifaceted platform that amalgamates code, data, and descriptive text into a single, coherent document.

Jupyter Notebooks bring a host of features conducive for academic and professional work in data science and accounting:

  • Incremental Execution: They allow for the piecewise execution of code, enabling real-time evaluation and debugging.

  • Rich Output: Jupyter Notebooks support a variety of output formats including tables, images, charts, and even interactive visualizations. This is particularly beneficial for tasks such as financial modeling and data visualization.

  • Annotated Workflow: The inclusion of Markdown cells between code cells allows you to add explanatory text, mathematical equations (thanks to MathJax support), and even HTML elements, making your work more comprehensible and presentable.

In recent years, Jupyter Notebooks have become an industry standard in data science and academic research. Given their versatility and ease-of-use, it’s hardly surprising that the majority of data science tutorials, courses, and example projects available online utilize Jupyter Notebooks. This is equally true for accounting analytics, where the need for data-driven decision-making is ever-increasing.

While Jupyter Notebooks are not the sole avenue for Python programming, they are particularly well-suited for:

  • Beginners: The incremental execution and rich output features make it easier for newcomers to grasp Python and data analytics concepts.

  • Prototyping and Testing: The interactive nature of Jupyter Notebooks makes them excellent tools for quick code experiments, data analysis trials, and ad-hoc querying.

  • Code Sharing and Collaboration: Given their comprehensive format, Jupyter Notebooks are ideal for sharing analyses, methodologies, and results with colleagues or for educational purposes.

By the way, this book was authored using Jupyter Notebooks, embodying the principles of interactive learning and data-centric academic instruction. Jupyter Notebooks offer a harmonious blend of coding capability and textual narrative, making them an indispensable tool in the modern landscape of data analytics in accounting. Whether you’re a student, a practitioner, or a seasoned researcher, integrating Jupyter Notebooks into your workflow will likely augment both your productivity and the reproducibility of your work.

1.7. Launching Jupyter Notebooks: Your Gateway to Python Programming#

Initiating Jupyter with Anaconda

Once you’ve successfully installed Anaconda, launching Jupyter Notebook is a straightforward process. You have a couple of options:

  • Via the Applications Menu: Navigate to your computer’s applications menu and search for “Jupyter Notebook.” Clicking on it will initiate the service.

  • Via Anaconda’s Command Prompt (Windows) or Terminal (Mac/Linux): Open Anaconda’s specialized command prompt (Windows) or your regular terminal (Mac/Linux) and type the following command: jupyter notebook

Note for Windows Users: It’s crucial to open the Anaconda-specific command prompt for the terminal command to function properly.

Upon successful launch, Jupyter Notebook typically opens a new browser window that showcases your file directory. Alternatively, it may generate a URL that you’ll need to manually enter into your browser’s address bar, particularly if you’re running Jupyter in a server environment.

JupyterLab is a more advanced interface that comes bundled with Anaconda. It provides a more integrated environment for Jupyter Notebook, and other types of documents and activities. To launch JupyterLab:

  • From Anaconda Navigator: Navigate to Anaconda’s applications menu and select JupyterLab.

  • From Command Prompt or Terminal: Enter the following command: jupyter-lab

Initiating Jupyter with Python

If you opted for a ‘pure’ Python installation, you can also run Jupyter Notebook or JupyterLab, although you’ll need to install them first. Here’s how:

  1. Install Jupyter: Open your terminal and run the following pip command to install Jupyter Notebook:

  • Notebook: pip install notebook

  • JupyterLab: pip install jupyterlab

  1. Launch Jupyter Notebook or JupyterLab: After installation, you can launch Jupyter Notebook by running:

  • Notebook: jupyter notebook

  • JupyterLab: jupyter-lab

A browser window should automatically open, showcasing your file directory. As with Anaconda, you may need to manually enter a URL into your browser in certain instances.

That’s it! Now you have a working Python environment, and you can start coding.