38bece23b72c742bad817a35ecd94ae7f7ded8d9
Django/Django - Optimizing views using the ORM.md
| ... | ... | @@ -0,0 +1,137 @@ |
| 1 | +https://medium.com/@marcnealer/django-optimizing-views-using-the-orm-1392c736ba01 |
|
| 2 | + |
|
| 3 | +# Django. Optimizing views using the ORM | by Marc Nealer | Medium |
|
| 4 | + |
|
| 5 | + |
|
| 6 | +[Marc Nealer](https://medium.com/@marcnealer) |
|
| 7 | + |
|
| 8 | +I’ve been using Django since version 1.2. It’s that far back, Django didn’t have a migration tool, but there was an add on called south which sometimes worked. |
|
| 9 | + |
|
| 10 | +From then until now, there has been one major issue with django and that is speed and memory issues caused by the ORM. If we graphed speed and memory against the number of calls, we would find a nice line curving upwards. So how do we deal with this. |
|
| 11 | + |
|
| 12 | +Understanding when db calls are being made |
|
| 13 | +------------------------------------------ |
|
| 14 | + |
|
| 15 | +The first and most common mistake Django programmers make, is not understanding when calls are being made. |
|
| 16 | + |
|
| 17 | +To start with, lets take a look at this snippet |
|
| 18 | + |
|
| 19 | +```python |
|
| 20 | +from django.http import HttpResponse |
|
| 21 | + |
|
| 22 | +def example1_1(request): |
|
| 23 | + return HttpResponse("<h1>Hello World</h1>") |
|
| 24 | +``` |
|
| 25 | + |
|
| 26 | + |
|
| 27 | +Your first thought is that there are no ORM calls being made here, but you wold be wrong. The requests object that is passed to the view will get a record from the database from the sessions table. If your logged in, it will also get a reference to your User record. Calls are being made without you doing anything. |
|
| 28 | + |
|
| 29 | +Now look at this. |
|
| 30 | + |
|
| 31 | +```python |
|
| 32 | +from django.http import HttpResponse |
|
| 33 | +from django.contrib.auth.models import User |
|
| 34 | + |
|
| 35 | +def example1_2(request): |
|
| 36 | + recs = User.objects.all() |
|
| 37 | + names = [x.username for x in recs] |
|
| 38 | + return HttpResponse(f"<h1>{''.join(names)} </h1>") |
|
| 39 | +``` |
|
| 40 | + |
|
| 41 | + |
|
| 42 | +This is a VERY common mistake to make. The call to get the User records, doesn’t actually get the records, but just references to them. When this QuerySet object is iterated through, it does a call to the database for each and every record. |
|
| 43 | + |
|
| 44 | +values and values\_list |
|
| 45 | +----------------------- |
|
| 46 | + |
|
| 47 | +On querying the database there are simple solutions. If we use the values() or values\_list() methods, the call to the database does not return references to the records, but instead returns specific values. |
|
| 48 | + |
|
| 49 | +```python |
|
| 50 | +from django.http import HttpResponse |
|
| 51 | +from django.contrib.auth.models import User |
|
| 52 | + |
|
| 53 | +def example1_2_fixed(request): |
|
| 54 | + names = ''.join(User.objects.all().values_list('username', flat=True)) |
|
| 55 | + return HttpResponse(f"<h1>{names} </h1>") |
|
| 56 | +``` |
|
| 57 | + |
|
| 58 | + |
|
| 59 | +This results in a single call to the database to get all the usernames. So the first thing to take care to do, is know what data you want and only go and get the items you need. Making generalized calls to the database and iterating through is not something you should do. If you don’t know already values() returns a dictionary and values\_list() returns a list of lists. If you want only one value, then adding flat=True to values\_list() results in a single list. The lists and dictionaries are still returned as Query like objects, but behave the same as list and dictionaries for the most part. |
|
| 60 | + |
|
| 61 | +There is one problem with this and that is getting related fields. You can fetch them using the “field\_\_field” notation, but keys etc use these names and makes the data ugly. You usually have to do some cleanup after the fact. |
|
| 62 | + |
|
| 63 | +Do remember this section when you are passing data to templates. Passing QuerySets to the templates will still result in more database calls being made. Preparing and cleaning data, then send that to the template is far better and more efficient than doing this in the template. |
|
| 64 | + |
|
| 65 | +**select\_related(), prefetch\_related()** |
|
| 66 | + |
|
| 67 | +These are relatively new additions. they can speed up queries by fetching related records, when you call to get a whole record. They don’t always work, the way you think. In fact, in some cases, they can slow things down. I say they are best avoided. |
|
| 68 | + |
|
| 69 | +Nest Your Queries |
|
| 70 | +----------------- |
|
| 71 | + |
|
| 72 | +All the elements in a single command via the ORM are compiled and sent to the database as a single query. |
|
| 73 | + |
|
| 74 | +```python |
|
| 75 | +from django.http import HttpResponse |
|
| 76 | +from django.contrib.auth.models import User |
|
| 77 | + |
|
| 78 | +from .models import Blog |
|
| 79 | +def example2(request): |
|
| 80 | + author_titles = Blog.objects.filter(user__in=User.objects.filter(username__icontains="mega") |
|
| 81 | + ).values_list('title', flat=True) |
|
| 82 | + return HttpResponse(f"<h1>{''.join(author_titles)} </h1>") |
|
| 83 | +``` |
|
| 84 | + |
|
| 85 | + |
|
| 86 | +Now I know I could have done this in a different way, but I did it like this to show a point. There are two queries in this. One to Blog and one to User, but the ORM will merge them into a single query. |
|
| 87 | + |
|
| 88 | +Separating queries using Javascript and an API. |
|
| 89 | +----------------------------------------------- |
|
| 90 | + |
|
| 91 | +This is a method that most Django backend programmers will not have considered, but works very well. The slowdown and performance issues are related to the number of calls made in each request, or more to the point the thread that is spawned for each request. You can start playing around and starting threads to run ORM calls, but that’s kinda ugly. This is a much better solution. Pass the minimum information needed for records to the templates using values() or values\_list() and send out the page. If more information is needed such as a set of images, a list of comments etc, don’t get them in the main view. Instead, set-up an API that will obtain this data based on the main records id number. |
|
| 92 | + |
|
| 93 | +For example, You display a blog. The blog has a lot of comments attached. You can get the blog and the comments and pass them to the template, but there are instances where this results in too many calls. Instead, you just pass the blog to the template with javascript or jQuery to take the id of the blog and call an API to return the comments in a different call. |
|
| 94 | + |
|
| 95 | +This works really well, especially for business related data where you might need to extract and transform data from all over the place to create charts. Also remember that an API call does not have to return JSON or XML. Use the power of Django and get it to create and return HTML snippets instead, Much easier than trying to read the json and getting javascript to create the html. |
|
| 96 | + |
|
| 97 | +It also works very well for writing. Don’t send a list of 1000 updates to the backend in a single block, instead, send them one or a few, at a time to an API. The One at a time is a little wrong here as you can send multiple calls to the API and do multiple updates simultaneously. |
|
| 98 | + |
|
| 99 | +Writing to the DB |
|
| 100 | +----------------- |
|
| 101 | + |
|
| 102 | +This is the place most applications have issues. They want to do mass update to the database and the ORM just doesn’t cut it. There are three ways this can be resolved. |
|
| 103 | + |
|
| 104 | +The first I’ve already sort of mentioned. Send your updates in small batches or one by one to an API endpoint. In this case, you can send the data from one python app, to the Django session. aiohttp, works really well to allow mass connections to the API end point. |
|
| 105 | + |
|
| 106 | +**Transaction** |
|
| 107 | + |
|
| 108 | +This one is really useful if you need to create a group of related records. |
|
| 109 | + |
|
| 110 | +```python |
|
| 111 | +from .models import Blog, Comment |
|
| 112 | +from django.db import transaction |
|
| 113 | + |
|
| 114 | +# Create your views here. |
|
| 115 | +def transaction_example(records): |
|
| 116 | + with transaction.atomic(): |
|
| 117 | + for rec in records: |
|
| 118 | + blog = Blog.objects.create(user__id=rec["id"], title=rec["title"],body=rec["body"]) |
|
| 119 | + for comment in rec["comments"]: |
|
| 120 | + Comment.objects.create(user__id=rec["id"], comment=comment, blog=blog) |
|
| 121 | +``` |
|
| 122 | + |
|
| 123 | + |
|
| 124 | +Using this, the record objects are created in memory but not sent to the database until the context manager ends. This is the best route when you have main records, with lots of different related records to be written out. |
|
| 125 | + |
|
| 126 | +**bulk\_create() and bulk\_update()** |
|
| 127 | + |
|
| 128 | +You might have looked at these to resolve your issues, but they work on one object type at a time, so if you have a situation, like above with lots of related records they don’t work out so well. |
|
| 129 | + |
|
| 130 | +If you are creating a lot of the same object types, then bulk\_create() works well. bulk\_update() doesn’t really work out so well. You still need to get each record your updating. bulk\_update() just does the saves all at the same time. It can be of use, but is kinda limited. |
|
| 131 | + |
|
| 132 | +Conclusion |
|
| 133 | +---------- |
|
| 134 | + |
|
| 135 | +The Django ORM is not built for bulk reading and writing. Its made for getting small amounts of data to render a page. Transactions can help with some of the larger updates, but it still has its limits. The same can be said for bulk\_update() and bulk\_create(). For working with large and very large changes and creates, its best to work via the API solution. It really works well. You can even put them together. Send 100 records at a time to an API that then does a bulk\_create(). |
|
| 136 | + |
|
| 137 | +With Reading and Querying. Remember that extracting the data using values() and values\_list() is the best option. If you have more complex queries, its still best to get data via these and then use this data in another query, which in turn will get data via values() and values\_list(). They also save a LOT of time when rendering templates. |
|
| ... | ... | \ No newline at end of file |