2/15/2022: Profiling and Performance
A collection of notes to go over in class, to keep things organized.
NOTES:
I’ll try to have a break every hour or so – ping me if I forget!
pylint: Duplicate Code
Now THIS is an annoying one. PyLint gives an error for duplicate code:
************* Module users
users.py:1:0: R0801: Similar lines in 2 files
==user_status:[4:13]
==users:[4:13]
import dataclasses
from dataclasses import dataclass
from typing import Optional, Dict
from loguru import logger
from pymongo.errors import DuplicateKeyError
from pymongo.results import DeleteResult
from mongo_collection import MongoCollection (duplicate-code)
This is particularly annoying, as having the same import block in multiple files is NOT an error, or code that can reasonable be de-duplicated.
Another place you may have duplicate code is tests.
In my solution, I was able to add:
# pylint: disable=duplicate-code
In the files with the duplicate code, and the error sent away. But that doesn’t always work.
It turns out this is a long-standing (2014!) issue in pylint:
https://github.com/PyCQA/pylint/issues/214
The solution is to disable it in a .pylintrc
file:
[BASIC]
disable=
duplicate-code
Caution! In many case, the duplicate-code check is a good one! So don’t turn it off until you’ve linted your code with it on.
Packages
Big topic – but it’s pretty key.
This should have been covered in the course 1 – but as a reminder, let’s take a look:
https://uwpce-pythoncert.github.io/ProgrammingInPython/modules/Packaging.html#packages-and-packaging
Context Managers
Let’s talk about context managers!
There are some notes in the Course 1 materials under Extra Topics:
https://uwpce-pythoncert.github.io/ProgrammingInPython/modules/ContextManagers.html
Luis’ Solution used a context manager – let’s take a look at that:
class MongoDBConnection():
'''MongoDB Connection'''
def __init__(self, host='127.0.0.1', port=27017):
""" be sure to use the ip address not name for local windows"""
self.host = host
self.port = port
self.connection = None
def __enter__(self):
self.connection = MongoClient(self.host, self.port)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.connection.close()
Example Code that’s getting a bit confused:
https://github.com/uw-continuum/python-320-assignment-05-busbykt
Break Time!
10min break:
Mongo Issues
Transactions ?
Quoting the MongoDB docs:
In MongoDB, an operation on a single document is atomic. Because you can use embedded documents and arrays to capture relationships between data in a single document structure instead of normalizing across multiple documents and collections, this single-document atomicity obviates the need for multi-document transactions for many practical use cases.
However, there are some cases where you want to operate on multiple collections as a single action.
In recent versions, MongoDB does provide a transaction option:
https://pymongo.readthedocs.io/en/stable/api/pymongo/client_session.html#transactions
If you did build your system with two collections – one for users, and one for status updates – then a transaction might make sense. Let’s give that a try:
Luis’ solution:
(not published yet)
Let’s take a look.
Using Mongo in a native way
The way this assignment was set up, it’s very natureal to use two collections, just like you did with PeeWee.
But then you needed to manually keep them in sync – e.g. remove status messages when you removed a user
Is there another way? let’s take a look!
Break Time!
10min break
Profiling and Performance
Performance Approaches:
This week has a lot of disparate material in it.
And some of it is pretty advanced (getting your compiler set up for Cython, etc.)
So: Do read it – Do try to do some of it, but don’t worry too much if you can’t figure it all out.
But hopefully you’ll remember the ideas later on in your Python careers, and you can learn it for real then.
What you really should be able to do at this stage:
Basic Timing of code: both whole programs and little bits.
Basic Profiling – where are the bottlenecks?
An understanding of what data structures to use where.
So: for this week, once you’ve got everything working, do some timing, do some profiling, figure how how to make the bottlenecks faster, and report what you’ve found.
About performance and profiling:
Here’s some of my notes on the topic – for an overview:
https://uwpce-pythoncert.github.io/ProgrammingInPython/modules/Profiling.html
A context manager timer:
Since we just talked about context managers – let’s do this little exercise and make a handy timer: