:orphan:

.. _notes_lesson08:

###############################
3/1/2022: Functional Techniques
###############################


A collection of notes to go over in class, to keep things organized.

**NOTES:**

I'll try to have a break every hour or so -- ping me if I forget!


One small note: ``for: else``
-----------------------------

The ``else`` option for a for loop is confusing, and rarely used. But really handy when you do need it. Example from one of my tests:

.. code-block:: python

    # need to check if at least one was correct
    # user: 'bwinkle678' should have status: 'st12455'
    user = snw.search_user('bwinkle678')
    for status_update in user.status_updates:
        if status_update.status_id == 'st12455':
            break
    else:
        assert False, "id: 'st12455' not found in user 'bwinkle678'"

As a mnemonic, I like to think of it as "else not break".


Results from the pymongo ``insert_many()`` call
-----------------------------------------------

One of the tricks of using pymongo's insert_many() is that when you pass in a whole bunch of stuff to insert, there is no single result -- they all could have passed, they all could have failed.

If anything went wrong, then it raises a ``BulkWriteError``.

But what went wrong? and what went right?

pymongo adds a ``.details`` attribute to the ``BulkWriteError``, that has a lot of information.

::
        except BulkWriteError as err:
            details = err.details
            for error in  details['writeErrors']:
                logger.error(f"user_id: {error['keyValue']['_id']} Failed to write")
            return details['nInserted']


Lets look at this in my example solution:

Examples/lesson07/ConcurrentMongo

Look in social_network.py: ``SocialNetwork.add_users()``


DataSet
=======

This week's assignment involves building a version of your Social Network code with a functional approach, using an extension to PeeWee known as DataSet:

https://docs.peewee-orm.com/en/latest/peewee/playhouse.html#dataset

ONE thing I note: in the docs:

"The aims of the DataSet module are to provide: A simplified API for working with relational data, along the lines of working with JSON. ..."

Which aligns with my impression of DataSet -- it feels a bit like working with Mongo.

Luis has more experience than I do with DAtaSet, so he's going to give you an introduction.


Break Time!
===========

10min break:


Multiprocessing Issues
======================


MultiProcessing and pickling
----------------------------

A number of you saw this error:

::

    File "/Users/chris/miniconda3/envs/py3/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
    File "/Users/chris/miniconda3/envs/py3/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

    TypeError: cannot pickle '_thread.lock' object

I got that too, when I tried to set it up this way:

.. code-block:: python

    for chunk in pd.read_csv(filename,
                             chunksize=CHUNK_SIZE,
                             iterator=True):
        print(f"CHUNK {chunk_number}")
        data = ({'user_id': row['USER_ID'],
                 'email': row['EMAIL'],
                 'user_name': row['NAME'],
                 'user_last_name': row['LASTNAME']
                 } for index, row in chunk.iterrows()
                )
        proc = multiprocessing.Process(target=snw.add_users, args=(data,))
        processes.append(proc)
        proc.start()
        chunk_number += 1
    for proc in processes:
        proc.join()

So what's wrong here?

**NOTE:**

This same code DOES work with multithreading -- why is that???

Would one of you like to share your successful solution? Or look at mine?


"multiprocessing must be in ``__name__ == "__main__"``
------------------------------------------------------

In the official docs:

https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods

And in various googlable sources, we are told that the starting of Processes must be in a if ``__name__ == "__main__":`` block.

Really? could that possibly be true?

Well, sort of.

It does NOT mean that you can't put Process creating (and starting) in functions, classes, etc -- pretty much anywhere.

The examples are very misleading:

[look at the examples in docs (under "Safe importing of main module")]

Let's see what it actually says:

"Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process)."


Let's look at my timer code:

Examples/lesson07/ConcurrentMongo/timing.py


Windows vs \*nix
----------------

Stephen did some experiments with the same code on Windows and a Raspberry Pi running Linux.

Let's take a look.

https://docs.google.com/spreadsheets/d/17879dX9pvfTGF5Dpjsikm-MHKIs5oyakgm6J2S1K7bQ/edit#gid=1860662791


Using a Queue
-------------

A Queue makes a lot of sense for this goalL you probably don't know how large a CSV file you are going to read in -- so how big should the chunks be?

But you do know how many processers you have.

A Queue lets you create one or more "tasks" and then set up a defeined number of processes to work on them.

But is is a bit tricky to manage -- when do you put the tasks on the queue? when do you know it's done?

I did it with a ``JoinableQueue`` which is pretty slick.

Shall we look?

Jared did it with a regular Queue but had an issue -- let's check that out.


Break Time!
===========

10min break


Closures
========

Closures can be a tricky topic.

A key part of it is understanding "Scope" in Python.

There's notes and examples in Canvas, but if have a bit of time, let's go over some notes:

https://uwpce-pythoncert.github.io/ProgrammingInPython/modules/Closures.html

(These are found in the PY310 "Extra Topics")