forked from mikeckennedy/talk-python-transcripts
-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy path049_ Microsofts_JIT_based_Python_Project_Pyjion.txt
More file actions
1876 lines (938 loc) · 74.5 KB
/
049_ Microsofts_JIT_based_Python_Project_Pyjion.txt
File metadata and controls
1876 lines (938 loc) · 74.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
00:00:00 This episode, you'll learn about a project that has the potential to unlock massive innovation
00:00:04 around how CPython understands and executes code.
00:00:07 And it's coming from what many of you may consider an unlikely source, Microsoft and the recently open-sourced cross-platform .NET Core Runtime.
00:00:15 You'll meet Brett Cannon, who works on Microsoft's Azure Data Group.
00:00:19 Along with Dino Villan, he is working on a new initiative called PYJION, P-Y-J-I-O-N,
00:00:25 a JIT framework that can become part of CPython itself, paving the way for many new just-in-time compilation initiatives in the future.
00:00:33 This is episode number 49 of Talk Python To Me, recorded February 4th, 2016.
00:00:51 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:01:09 This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy.
00:01:13 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.
00:01:20 This episode is brought to you by Hired and SnapCI.
00:01:23 Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.
00:01:31 Hey, everyone. I think you're going to love this episode.
00:01:33 Brett is doing some amazing work, and we talk about that in depth, but he's also a Python core developer,
00:01:39 and we spend a decent amount of time on Python 3 and moving from Python 2 to Python 3 and that whole story there.
00:01:45 I do have just one piece of news for you before we get to the interview.
00:01:49 It's just T minus 10 days until my Kickstarter for Python jumpstart by building 10 apps closes.
00:01:55 The initial feedback from the early access students has been universally positive.
00:02:00 If you have backed the Kickstarter with early access, be sure to create an account at training.talkpython.fm
00:02:05 and send me a message via Kickstarter so I can get you the first six chapters, about three hours, of the course.
00:02:11 If you're not sure what I'm talking about here, check out my online course at talkpython.fm/course.
00:02:16 Now, let's hear about JIT innovation in CPython and more with Brett Cannon.
00:02:22 Brett, welcome to the show.
00:02:23 Thanks for having me, Michael.
00:02:24 I'm super excited to talk to you about this new project that you guys have going on with Python and Microsoft.
00:02:29 And yeah, we're going to dig into it. It'll be fun.
00:02:31 Yeah, I'm looking forward to it.
00:02:32 Absolutely.
00:02:33 So before we get into that topic, though, what's your story?
00:02:36 How do you get going in Python and programming and all that?
00:02:39 They're slightly long stories.
00:02:40 So getting into programming, probably my earliest experience with anything you could potentially call programming was Turtle back in third grade.
00:02:48 I was lucky enough to be in a school that had a computer lab full of Apple IIEs.
00:02:52 And they'd bring us in and say, oh, look, you can do this little forward command and make this little turtle graphic draw a line and all this stuff.
00:02:59 Was that on the monitor that was just like monochrome green?
00:03:02 Yep. And that's why I think I used one of those, too.
00:03:05 Yeah. I sometimes run my terminal with that old green and black style because it's just what I started with back in the day.
00:03:11 Oh, that's awesome.
00:03:12 So I did that, but I didn't realize what the heck programming was.
00:03:15 But I always found computers kind of this fascinating black box that somehow you stick in these five and a fourth inch floppies, which dates me.
00:03:21 And somehow we're in the world in Carmen San Diego plays.
00:03:24 I was like, wow, this is amazing.
00:03:26 And then in junior high, I ended up taking a summer class on computers and it involved a little bit of Apple basic.
00:03:33 And I really took to it.
00:03:35 I actually lucked out and got so far ahead of the class.
00:03:38 The teacher just said, yeah, you can stop coming to class if you want for the rest of the summer.
00:03:41 So that was like halfway through.
00:03:44 So I got bit kind of early, but I didn't really have any guidance or anything back then.
00:03:49 I mean, this is pre-access to the Internet, so I didn't really have any way to really know how to carry on.
00:03:54 And then when I went to junior college, my mom made me promise her that I would take a class in philosophy and a class in computer science.
00:04:01 And I did both and I loved them both.
00:04:03 But in terms of the computer science, I read through my C book within two weeks.
00:04:08 And then one night, spent six hours in front of my computer writing tic-tac-toe from scratch.
00:04:14 Using really basic terminal output.
00:04:15 And I was basically hooked for life.
00:04:17 In terms of Python.
00:04:19 That's really cool.
00:04:20 I think we all have that moment where you sit down at a computer and you haven't, maybe you've really enjoyed working with them or whatever.
00:04:28 But then you kind of get into programming and you realize, wow, eight hours have passed.
00:04:33 And it feels like I just sat down.
00:04:35 And then you're in the world.
00:04:37 That's it.
00:04:37 Brought me my dinner at my desk.
00:04:39 And you said, okay, I get it.
00:04:40 You're just into this.
00:04:42 Just go with it.
00:04:44 Here's your food.
00:04:44 Make sure you eat at some point tonight.
00:04:45 Awesome.
00:04:46 Yeah.
00:04:47 And in terms of Python, I actually ended up going to Berkeley and getting a degree in philosophy because there were some issues trying to double major like I originally planned to do.
00:04:56 But I did try to still take all the CS courses there.
00:04:59 And there was a test to basically get into the intro of CS course at Berkeley at the time.
00:05:05 And I thought they might have something about object-oriented programming.
00:05:08 And having learned C, I knew procedural, but I didn't know object-oriented programming.
00:05:11 So in fall of 2000, before I took the class in spring, I decided to try to find an object-oriented programming language to learn OO from.
00:05:20 And I was reading and all this stuff.
00:05:22 And Perl and Python caught my eye.
00:05:25 But when I kept reading, Perl should be like the fifth or sixth language you learned.
00:05:28 While people kept saying, oh, Python's great for teaching.
00:05:30 I mean, all right, I'll learn Python.
00:05:31 And I did.
00:05:33 And I loved it.
00:05:33 And then I just continued to use it for anything I could and all my personal projects.
00:05:37 And just kept going and going with it.
00:05:39 And I haven't looked back since.
00:05:40 Yeah, that's really cool.
00:05:41 What language was your CS 101 course actually in?
00:05:45 Scheme, actually.
00:05:46 Interesting.
00:05:47 My CS 101 class was Scheme as well.
00:05:50 And I thought that was a very interesting choice for an introduction.
00:05:53 Yeah, it was really interesting.
00:05:55 I mean, it does kind of do away with the syntax.
00:05:58 But obviously, now being a Python user, I really understand what it means to kind of really minimize the syntax in a nice way instead of a slightly painful way with all those parentheses.
00:06:06 And it was interesting.
00:06:08 I mean, it is a nice way to try to get in procedural programming and object-oriented and functional.
00:06:14 So it was really nice to do multi-paradigm, teach you the basics kind of introduction.
00:06:19 They did actually, interestingly enough, for the last project to have us write a really basic logo interpreter, which, funny enough, was such a bad experience for me,
00:06:28 partially because of the way it worked out in terms of having to work with another team.
00:06:32 And I had some issues with my teammates.
00:06:35 I actually kind of got turned off on language design, of all things, for a little while.
00:06:40 And then I just, over time, kept realizing I loved programming languages, learning how they worked.
00:06:44 So I just re-evaluated my view and just realized, okay, it was just a bad taste from a bad experience and realized that I actually do have this weird little fascination with programming languages.
00:06:55 And luckily got over that little issue of mine.
00:06:57 Yeah, no kidding.
00:06:58 And now you're a Python core developer, among other things, right?
00:07:01 Yeah.
00:07:01 So back to the language design, at least on the internals.
00:07:05 Yeah, yeah.
00:07:06 Awesome.
00:07:07 So we're going to talk about Pigeon, this cool new JIT extension.
00:07:14 You're going to have to tell me a little more about how you'd most correctly characterize it for CPython.
00:07:19 But before we do, I thought maybe you could give us like a high-level view of two things.
00:07:24 How CPython works, what's sort of going on when we run our code as is, right, with the interpreter.
00:07:32 And then maybe a survey of the different implementations or runtimes.
00:07:36 Because a lot of people think there's just one Python from an implementation or runtime perspective.
00:07:42 And there's actually quite a variety already, right?
00:07:44 Yeah, actually, we're kind of lucky in the Python community of having a lot of really top-quality implementations.
00:07:50 But to target your first question of how CPython works, which is, for those who don't know,
00:07:55 CPython is the version of Python you get from python.org.
00:07:59 And the reason it's called CPython is because it's implemented in C and has a C API,
00:08:04 which makes it easy to embed in stuff like Blender.
00:08:07 Anyway, basically, the way Python works is more or less like a traditional interpreted programming language
00:08:12 where you write your source code.
00:08:14 Python acts as a VM, reads the source code, parses it into individual tokens like
00:08:20 if and def and, oh, that's a plus sign and whatever.
00:08:24 And then that gets turned into what's called a concrete syntax tree, which is kind of just like the way the grammar is written kind of nests things.
00:08:32 And this is how you get your priorities in terms of precedence, like multiplication happens before plus, which happens before whatever.
00:08:40 And that all works out in the concrete syntax tree in terms of how it nests itself.
00:08:45 And then that gets passed into a compiler within Python that turns that into what's called an abstract syntax tree,
00:08:51 which is much more high level.
00:08:52 Like this is addition instead of plus and two things.
00:08:55 And this is loading a value.
00:08:58 And this is an actual number.
00:08:59 And this is a function call.
00:09:02 And then that gets passed farther down into the bytecode compiler, which will then take that AST and spit out Python bytecode.
00:09:09 And that's actually what's stored basically in your PYC files.
00:09:13 Actually, technically, they're marshaled code objects.
00:09:15 And then when Python wants to execute that, it just loads up those bytecodes and just has a really big for loop that basically reads through those individual bytecodes.
00:09:24 It goes, OK, what do you want me to do?
00:09:26 All right, you want me to load a const.
00:09:27 Const is zero.
00:09:29 And that happens to correlate to none in every code object.
00:09:32 So I'm going to put none onto what's called the execution stack because Python is stack-based instead of register-based.
00:09:39 So CPUs are register-based.
00:09:40 Stack-based VMs such as Python.
00:09:43 Java is another one.
00:09:44 It's fairly common because it's easier to implement.
00:09:48 Anyway, you can do stuff like load const none or load a number, load another number on the stack.
00:09:53 So the stack now has two numbers.
00:09:54 And then the loop might, the C eval loop for evaluation loop.
00:10:00 Yeah.
00:10:00 So it's worth pointing out to the listeners, I think, who maybe haven't gone and looked at the source code there.
00:10:06 When you say it's a big loop, it's like 3,000 lines of C code or something, right?
00:10:11 It's a big for loop.
00:10:13 Yeah, it literally is a massive for loop.
00:10:15 If you actually go to Python source code and you look in the Python directory, there's a file in there called ceval.c.
00:10:24 You can open that up and you will literally find nested in that file somewhere just a for loop with a huge switch statement that does nothing more than just execute these little byte codes.
00:10:35 So like if it hits add, what it'll do is just pop two values off of what's basically a chunk of memory where we know what's pointers are on the stack and just go, I'm going to take that Python object.
00:10:47 I'm going to take that Python object and execute the dunder add in the right way or the dunder r add and then make that all happen.
00:10:53 Get back a Python object and stick that back on the stack and then just go back to the top of the for loop and just keep going and going and going until you're done and your program exists.
00:11:01 Yeah, and you can actually see that byte code by taking loading up some Python module or function or class or whatever and importing the disassembly module and you can actually have it spit out the byte codes for like say a function, right?
00:11:15 Yep.
00:11:16 And I do this all the time on Pigeon, actually.
00:11:18 Basically, you can import the dis module, D-I-S.
00:11:22 And in there, there's a dis function.
00:11:24 So if you go dis.dis and then pass in any callable, basically, so function, method, whatever, and it'll just print out to standard out in your REPL all the byte code.
00:11:35 And it'll give you information like what line does this correlate to?
00:11:38 What is the byte code?
00:11:40 What's the argument to that byte code?
00:11:42 The actual byte offset and a whole bunch of other interesting things.
00:11:45 And the dis module documentation actually lists most of the byte code.
00:11:50 I actually found a couple of opcodes that weren't actually documented.
00:11:53 Now there's a bug for that.
00:11:54 But the majority of the byte code is actually documented there.
00:11:57 So if you're really interested, you can have a look to see actually how we kind of break down the operations for Python for performance reasons and such.
00:12:05 Yeah, that's really interesting.
00:12:07 And for the listeners who are wanting to dig deeper into this, on show 22, I talked with Philip Guau about his sort of CPython internals graduate course he did in the University of New York.
00:12:19 Have you seen his work?
00:12:20 No, I haven't yet.
00:12:21 He basically recorded 10 hours of a graduate computer science course studying the internals of CPython and spent a lot of time in cval.c.
00:12:30 And it's on YouTube.
00:12:31 You can go check it out.
00:12:32 So it's really cool.
00:12:32 So that's interesting.
00:12:35 Oh, I should probably actually answer your second question, too, about all the other interpreters.
00:12:38 Yeah, so let's talk about the interpreters.
00:12:39 As I said earlier, CPython is kind of, it's the one you get from python.org and kind of the one most people are aware of.
00:12:46 But there's actually a bunch of other ones.
00:12:49 So one of the more commonly known alternative interpreters or VMs or implementations of Python is Jython, which is Python implemented in Java.
00:12:58 So a lot of people love that whenever they have to write a Java app and want some easy scripting to plug in.
00:13:04 Or have some requirement that they have to run on the JVM.
00:13:06 Apparently, it's really popular in the defense industry for some reason.
00:13:10 Interesting.
00:13:10 Once you get a VM approved, you just don't mess with it, I'd say.
00:13:13 Yeah.
00:13:14 Well, and one really cool perk of this is PyCon, every so often there's a really cool talk about flying fighter jets with Python using Jython and stuff like that.
00:13:25 So it does at least lead to some really cool talks.
00:13:27 Nice.
00:13:27 And here's the afterburner function.
00:13:29 You just call this.
00:13:30 Exactly.
00:13:32 There's Iron Python, which is Python implemented in C#.
00:13:35 So that's usable from .NET.
00:13:37 So once again, it's often used for embedding in .NET applications that need scripting or anyone who needs to run on top of the CLR.
00:13:48 Those are the two big ones.
00:13:49 Obviously, in terms of direct alternatives, there's obviously PyPy, which I think a lot of people know about, which is two things.
00:13:57 There's PyPy, the implementation of Python written in Python, although technically it's a subset of Python called RPython, which is specifically restricted such that they can infer a lot of information about it.
00:14:09 So that can be compiled down straight to basically assembly.
00:14:13 And then there's PyPy, the tool chain, which they developed for PyPy, the Python implementation, which is basically this tool chain to create custom jets for programming languages.
00:14:25 So you can take the PyPy tool chain and not just implement Python in Python, but they've done it for like PHP, for instance.
00:14:33 And so you can actually write alternative implementations of languages in RPython and have it spit out a custom just designed for your language.
00:14:40 Those are the key ones that have actually finished in terms of compatibility with some specific version of Python.
00:14:46 All of them currently target 2.7.
00:14:48 PyPy has support for Python 3.2, but obviously that's kind of an old support in terms of Python 3.
00:14:55 And then there's the new up-and-comer, which is Piston, which is being sponsored by Dropbox.
00:15:00 And they're also targeting 2.7.
00:15:02 And they're trying to version a Python that is as compatible with CPython as possible, including the C extension API.
00:15:09 But what they're doing is they've added a JIT or using a JIT from LLVM.
00:15:14 So they're trying to make 2.7 fast using LLVM JIT and pulling as much of the C code and API as they can from CPython to try to be compatible with extension modules, which is a common problem that PyPy, IronPython, and Drython have.
00:15:27 Right. That one actually seems to be really interesting and have a lot of potential.
00:15:32 Because if you think of companies that are sort of Python powerhouses, Dropbox is definitely among them.
00:15:39 Yeah, it definitely does not hurt when Guido went to go work there as well.
00:15:43 And they have Justin McKellar there and several other people.
00:15:46 Benjamin Peterson works for them.
00:15:48 So they already have a couple of core devs and high up people in the Python community working there.
00:15:52 And their whole server stack in the back, I believe, is at least mostly Python.
00:15:56 Their desktop clients are Python.
00:15:58 They're definitely Python heavy there.
00:16:00 Yeah, absolutely.
00:16:02 So how does Pigeon relate to the thing that came to mind for me when I saw it announced was, you know, a friend of mine, Craig Bernstein, sent me a message on Twitter and said, hey, you have to check this out.
00:16:13 And I'm like, oh, that is awesome.
00:16:15 And it was just, you know, a Twitter message.
00:16:17 You know, check out this JIT version of Python coming from Microsoft.
00:16:22 Well, I don't know anything about it, but maybe it's like PyPy.
00:16:26 So what are you guys actually building over there?
00:16:28 What is this?
00:16:29 Pigeon was actually started by Dino Velen, one of my coworkers.
00:16:32 And I believe that I don't know if he's necessarily the sole creator, but definitely one of the original creators of Iron Python back at PyCon US 2015, which was in Montreal.
00:16:43 During the language summit, Larry Hastings, the release manager for Python 3.4 and 3.5,
00:16:49 got up in front of the core developers and said, what can we do to get more people to switch to Python 3 faster?
00:16:55 Because obviously we all think Python 3 is awesome and legacy Python 2 is fine, but everyone should get off that at some point.
00:17:01 Yeah, I hear you.
00:17:02 I agree.
00:17:02 So what do you do, right?
00:17:03 Yeah, that could be a whole other question on that one, Michael.
00:17:07 So he said, what can we do?
00:17:09 What can we do?
00:17:09 And he said, performance is always a good thing.
00:17:11 People always seem to want more performance, no matter how well Python does.
00:17:15 People are always hungry for more.
00:17:16 And Dino went, yeah, that's a good idea.
00:17:18 I know, I'll see.
00:17:19 .NET just got open sourced back in April 2015.
00:17:23 And he said, you know what?
00:17:25 I will see if I can write a JIP for CPython using Core CLR.
00:17:29 Because Dino also happened to used to be on the CLR team.
00:17:32 So he knows the opcodes like the back of his hand.
00:17:35 And so he started to hack on it at the conference and actually managed to get somewhere.
00:17:40 And he premiered it at PyData Seattle back in July when we hosted it at Microsoft.
00:17:45 And I got brought on to basically help him flesh out the goals.
00:17:50 There's basically three goals.
00:17:52 One is to develop a C API for CPython to basically make it pluggable for a JIT.
00:17:58 Like one of the tough things that people have always done, like Unladen Swallow started with and Pistons also doing, is they're directly tying into a fork of CPython, more or less, a JIT, which really tightly couples it.
00:18:11 But it also means that, for instance, if LLVM does not work for your workload for whatever reason, you're kind of just stuck and it's just not an option.
00:18:18 Well, we would rather basically make it so that there's just an API to plug in a JIT.
00:18:24 And then that way CPython doesn't have to ship with a JIT, but it's totally usable by a JIT.
00:18:29 And then that way, if LLVM or CoreCLR, which is the .NET JIT or Chakra or V8 or whatever JIT you want, as long as someone basically writes the code to plug from CPython into that JIT, you can use whatever works best for you.
00:18:46 That's really cool.
00:18:47 I think it's a super noble goal to say, let's stop everybody starting from scratch, rebuilding the CPython sort of implementation and weaving in their version of a JIT and saying, let's just find a way so that you don't have to write that ever again.
00:19:05 And you just plug in the pieces.
00:19:07 Yeah, exactly.
00:19:08 And actually, one of the other goals we have with this is not only developing the API, but goal number two is to write JIT for CPython using the CoreCLR and using that to drive the API design that we need that we want to push back up to CPython eventually.
00:19:25 But the third goal is actually to design kind of a JIT framework for CPython such that we write the framework that drives the coding mission for the JIT.
00:19:35 And then all the JIT people have to do is basically just write to the interface of this framework and don't have to worry about specific semantics necessarily.
00:19:45 So, for instance, you would be able to, as a JIT author, go, OK, I need to know how to emit an integer onto a stack and I need to know how to do add or add int.
00:19:55 But then the framework would actually handle going, OK, well, here's the Python bytecode that implements add.
00:20:01 Let's actually do an add call or, hey, I know this thing is actually an integer.
00:20:05 Let's do an add inc call and not just a generic Python add and be able to handle that level of difference so that there's a lot less busy work that's common to all the JITs like type inference and such and be able to extract that out so that it's even easier to add a JIT to CPython.
00:20:21 So is that like two levels?
00:20:23 Like on one hand, you have a straight C API at the CPython level and then optionally you could choose to use the C++ framework that makes it so you do less work and you plug in your sort of events or steps?
00:20:34 Yeah, exactly.
00:20:35 It's getting the bare minimum into CPython so that CPython at least has this option without everyone having to do a fork and as well as pushing down a level to a separate project where the common stuff is extrapolated out and everyone can just build off the same baseline.
00:20:49 And then only thing that has to really differ is what's unique to the JITs.
00:20:53 And then that way, everyone's work is as simple as possible to try to make this work.
00:20:56 OK, that makes a lot of sense.
00:21:04 This episode is brought to you by Hired.
00:21:11 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
00:21:16 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.
00:21:25 Typically, candidates receive five or more offers within the first week and there are no obligations ever.
00:21:30 Sounds awesome, doesn't it?
00:21:31 Well, did I mention the signing bonus?
00:21:33 Everyone who accepts a job from Hired gets a $1,000 signing bonus.
00:21:36 And as Talk Python listeners, it gets way sweeter.
00:21:39 Use the link Hired.com slash Talk Python To Me and Hired will double the signing bonus to $2,000.
00:21:46 Opportunity's knocking.
00:21:47 Visit Hired.com slash Talk Python To Me and answer the call.
00:21:56 Would you still be able to support things like method inlining and things like that with the C++ framework?
00:22:03 We don't know yet, but there's technically no reason why not.
00:22:08 What's actually really interesting is we started all this work and we actually weren't ready to premiere any of this yet.
00:22:15 We've been doing this out in the open on GitHub.
00:22:17 But as you mentioned, Michael, people started to tweet it and then it made it to Reddit and then it made it to Hacker News.
00:22:21 And suddenly everyone's asking questions and stuff.
00:22:23 But in the middle of all this, there's been a lot of work literally the past, I don't know, maybe two months of various core developers putting in a lot of time and effort trying to speed up CPython itself.
00:22:34 And part of this is actually trying to cache method objects so that they can get cached in the code object and actually not have to, every time you try to execute like a call by code,
00:22:47 not have to go to like the object, pull out the method object and then call that, but actually just cache the method object.
00:22:52 I already have it.
00:22:53 I don't need to re-access that attribute on the object.
00:22:56 And so it's already starting to bubble its way up into CPython.
00:23:00 And there shouldn't technically be any reason why we can't just piggyback off of that and just go, oh, well, they've already cached this or use a similar technique of basically,
00:23:08 if the object hasn't changed, I really don't need to worry about previous versions of this being different.
00:23:14 So I can just cache it and reuse it and just save myself the hassle of having to get a method back.
00:23:19 Or same thing with built-ins, right?
00:23:21 Like if you ever want to call len, some people cache it locally for performance.
00:23:26 But the work that's going on is actually going to make that a moot point because it's going to start to notice when the built-ins and the globals for your code have not changed.
00:23:35 And just go, well, I've already cached len locally because I already know I've used it previously.
00:23:39 So I might as well just pull that object immediately out of my cache instead of trying it in the local namespace, not having it there, going to the global namespace, not having it there, then going to the built-in namespace and having to pull out len again for every time through a loop, for instance, and call that.
00:23:53 Yeah, that's really great.
00:23:54 And I suspect you could just say, here's the JIT compiled machine instructions.
00:23:58 Just cache that or something like this.
00:24:00 Yeah, exactly.
00:24:02 So a lot of this work that's happening directly in CPython bubbles down both directions into helping JITs in various ways, right?
00:24:10 Like this whole detecting what state a namespace is from the last time you looked at it.
00:24:15 Has it changed at all or not?
00:24:17 That's probably going to end up in CPython itself as an implementation detail.
00:24:20 But it also means all the JITs will be able to go, oh, look, the built-in namespace hasn't changed.
00:24:25 So that means if I've cached len, I don't need to worry about it being changed.
00:24:28 I don't have to pay for a dictionary lookup.
00:24:30 I can just pull it right out of my array of cached objects and just go with it.
00:24:34 Okay.
00:24:34 Yeah, that sounds like it'll be great regardless of whether you're talking about a JIT or just running your code, right?
00:24:41 Yeah, no, it's going to be fantastic.
00:24:42 Everyone's going to win on that one.
00:24:44 Yeah, that's cool.
00:24:44 One of the things that I think is surprisingly slow in Python is calling methods, right?
00:24:51 Yeah.
00:24:52 It's more expensive maybe than it should be.
00:24:54 What other stuff kind of falls into that class that you can think of?
00:24:58 So the reason, just to give an explanation of why that's so slow, is if you look at what you can do with a method or function call,
00:25:08 Python's got a really rich set of semantics, right?
00:25:11 We have positional arguments.
00:25:13 We have keyword arguments.
00:25:14 We have star args and we have star star kwrgs.
00:25:19 We have keyword only arguments in Python 3.
00:25:21 I mean, they're default values and not.
00:25:24 There's a lot of different ways to try to build this stuff up into something that we can use to call a function with.
00:25:32 And some of them are really, really safe.
00:25:33 Right.
00:25:33 And maybe even closures as well, right?
00:25:35 On top of that.
00:25:36 Yeah.
00:25:37 Actually, luckily, that's not actually too costly for the actual call.
00:25:42 It's just when it comes time to look up the value, you've got to work your way up.
00:25:46 But that kind of ties into it, right?
00:25:47 So that's the other kind of expensive thing you have to do in Python is there's the cost of making a call itself because it just takes so much effort to build up what all the arguments should be.
00:25:57 And then there's the cost of just looking up the method or the function, right?
00:26:03 Because as you mentioned, there's closures.
00:26:05 So you have kind of this, you have local scope.
00:26:08 You have this potential closure scope, which are like sole variables or free variables.
00:26:13 If you're the guy calling out, you've got your global namespace.
00:26:16 You've got your built-in namespace.
00:26:18 And then that's on top of whether or not you've defined like a thunder get adder at method on your object.
00:26:23 This is going to have its own set of code to call to try to figure out what the heck you want, whether it can get it for you.
00:26:29 And that's the other real expense is trying to basically access attributes, which methods happen to be.
00:26:35 So that's one of the reasons that the calls can be so expensive.
00:26:37 It's not just the cost of getting the object, but it's also the call itself and just basically preparing for it.
00:26:43 Okay, interesting.
00:26:44 And this caching in CPython, you know, putting Pidget aside for a moment, that would make a big difference?
00:26:50 Yeah.
00:26:51 Yuri, I'm going to butcher his last name, so I honestly don't want to try.
00:26:54 Initial.
00:26:55 Yuri, you're center of law, I think.
00:26:58 Yeah.
00:26:59 Yuri, I believe it's Y.
00:27:00 I believe he lives in Toronto, actually.
00:27:04 He has actually developed some new opcodes.
00:27:07 For instance, load method and call method, which directly by themselves.
00:27:13 have a slight performance perk because they kind of skip some steps.
00:27:17 You typically have to make a method ready.
00:27:19 But Yuri's also been the one working on this caching stuff, building off of Victor Sinner's dictionary versioning.
00:27:26 And what he's doing is with his call methods and load methods, he's basically grabbing the unbound methods and sticking them on stack and just calling them directly without doing some extra work.
00:27:39 But with the caching, that thing he sticks on the stack, he can actually squirrel away and say, hey, next time I come to this call method or load method, I can just pull it right out of this cache as long as stuff hasn't changed in the namespaces above me.
00:27:50 And that's how he's trying to make method calls cheaper.
00:27:54 It's basically storing away the method object and fetching it right back if he can make sure for a fact that nothing has changed since last time he tried to get that object out.
00:28:03 Okay, that's awesome.
00:28:04 What's the time frame?
00:28:05 Any ideas?
00:28:06 Is it still just experimental or?
00:28:07 That's a good question.
00:28:09 So there's a pep.
00:28:10 So Victor Sinner has started what he's called Fat Python, F-A-T.
00:28:14 You can Google for that.
00:28:16 I'm sure you'll find it.
00:28:17 He currently has three peps, actually.
00:28:19 Pep 509 handles dictionary versioning, which is important for namespaces and caching.
00:28:25 Because you need to know if something like in your global namespace or your built-in namespace or even your local namespace has changed because all namespaces in Python or dictionaries, which is why you can introspect so much.
00:28:37 5.10 is adding guards to bytecode so that he can do stuff like add a guard saying, hey, if globals hasn't changed and built-ins hasn't changed, use this cast version of len.
00:28:51 This is before Yuri's stuff had started.
00:28:53 And then he's implemented PEP 5.11.
00:28:56 He's trying to add, actually, API for doing AST transformations so that you can basically plug in custom AST transformations to go like, well, if you're doing a number plus a number, we can just make it a number and skip the plus.
00:29:09 As of right now, PEP 5.10 and 5.11, I don't know where they're headed quite yet.
00:29:14 But 5.09 seems to be fairly well accepted.
00:29:18 And it's just a question of Victor finalizing the PEP and the design exactly and getting accepted.
00:29:24 So I really don't see any reason at all why that won't make it into Python 3.6.
00:29:28 And Yuri's stuff, he's already got patches and has benchmarked it and showed it working.
00:29:32 And there's some discussion about whether or not his current approach is the best or not.
00:29:37 But I personally don't see any reason why any of this won't make it in 3.6 either.
00:29:41 3.6.
00:29:42 Okay.
00:29:42 That's pretty excellent.
00:29:43 That's not too far out.
00:29:44 Yeah, no.
00:29:45 I think what we're due to hit beta in September.
00:29:48 So as long as you can get it, all this can wrap up by then.
00:29:51 It'll all land in Python 3.6.
00:29:53 And I should mention all this stuff is looking like Yuri's stuff, I think, is adding up to
00:29:58 between 5% and 10% across the board speed up improvements.
00:30:01 And depending on how your code looks, I think you're seeing up to 20% faster.
00:30:05 So definitely wins.
00:30:07 Yeah, that's a really big deal.
00:30:08 Okay.
00:30:09 Awesome.
00:30:10 I want to talk about the core CLR a little bit.
00:30:12 But before we do, you said something that I didn't expect you to say when we were talking
00:30:17 about jitters and plugging in jitters.
00:30:18 And that was V8 or Chakra.
00:30:21 That is awesome.
00:30:22 So somehow we could plug in the JavaScript engine from Chrome V8 or the one from IE and
00:30:29 Edge.
00:30:29 What would that look like?
00:30:30 We haven't really explored it yet, but it's definitely an idea we had.
00:30:33 Actually, before Chakra went open source, the Chakra team reached out to Dino and said,
00:30:37 hey, we think this might be useful to your project.
00:30:40 The thinking is, because JavaScript is as dynamic as it is, and all these jets have to be designed
00:30:46 to jit quickly, because obviously, if you're in your browser, no one wants to wait for their
00:30:51 favorite web-based email client to start running.
00:30:53 So they're really fast at the start.
00:30:55 But they also have to handle dynamicism really well, because JavaScript, just like Python,
00:31:00 can easily have attributes added and removed and changed at any time.
00:31:05 And so they have to be really flexible in terms of how they handle that kind of workload.
00:31:09 While Core CLR obviously does its best to be a really good all-around jit, obviously, it's
00:31:15 heavy uses like F-sharp and C-sharp and more static-based languages.
00:31:19 The thinking is that if we try to use a jit that worries about a language that's as dynamic
00:31:25 as JavaScript, we should be able to actually piggyback on all that work and actually have
00:31:30 a jit that works really well for Python, because it's already designed to deal with all the
00:31:34 dynamicism a programming language like Python and JavaScript have.
00:31:37 That's super interesting.
00:31:38 And I think if you have two distinct examples working against your API as different as the
00:31:46 CLR and JavaScript, you'll have a pretty robust API, right?
00:31:51 Yeah.
00:31:52 And that's the other thinking, too, is we want to get the Core CLR version done and passing
00:31:57 all of the Python test suite as much as reasonably possible so that we can go, OK, our jit framework
00:32:04 that we've designed to help drive these jits covers all the possible edge cases and basically
00:32:11 is good enough that if you implement these things in a reasonable fashion, you will get Python
00:32:15 compatibility.
00:32:16 And then that way we can just plug in and make sure that all this stuff just works both in
00:32:22 two completely different jits targeted to different types of languages and have it just all fall
00:32:27 through.
00:32:27 And honestly, it's a nice way to do performance comparisons for what kind of jit would probably
00:32:32 work best for Python.
00:32:32 Awesome.
00:32:33 That sounds like a really good idea.
00:32:34 I've done a fair amount of work with C# and the CLR.
00:32:38 And I know what the Core CLR is, but I suspect most listeners, when they hear .NET, they think,
00:32:44 oh, it's a Windows thing.
00:32:46 But you guys actually are doing quite a bit of different stuff now that Satya's in charge.
00:32:53 There's kind of a new mandate, right?
00:32:54 So tell people about the Core CLR.
00:32:56 I believe it was last year.
00:32:58 It was before I joined Microsoft this past July.
00:33:01 Basically, all of .NET was open sourced.
00:33:05 So previously, it was all this closed source thing that was very Windows only, except for
00:33:10 Mono, which kind of initially reverse engineered a bunch of things.
00:33:13 And then Microsoft said, oh, you know, well, we can at least open source, like I believe,
00:33:17 like the test suite and some other things for you to test your compatibility.
00:33:20 But Satya Nadella, as CEO of Microsoft, is really pushed for open source of Microsoft,
00:33:27 both its use, but also contributing and doing things in the open, both as in starting projects
00:33:32 from scratch that Microsoft has done in open sourcing those and also giving back to pre-existing
00:33:37 open source projects.
00:33:38 And one of the things they did was they completely open sourced .NET.
00:33:41 So .NET actually, I don't know if they've done the official release yet, but if you look at
00:33:46 least their digital integration tests, they're passing on Linux and OS X on top of Windows.
00:33:52 For instance, Pigeon right now is Windows only purely because of momentum and laziness on
00:33:58 Dnomi part.
00:33:59 And it has nothing to do with using Core CLR because Core CLR uses like CMake for its builds.
00:34:05 So it's already got a cross-platform build scripts set up and all that.
00:34:10 It's just basically Dnomi for Pigeon.
00:34:12 Haven't bothered to write the Visual Studio solution file in CMake to be able to run it on
00:34:18 Linux or OS X.
00:34:18 I think that's going to breathe a lot of new interest into sort of the whole CLR and
00:34:24 the C# side of things from people that are just saying, look, Windows is not an option
00:34:30 for whatever reason for us.
00:34:31 Yeah.
00:34:32 And I really hope it does, too, because I did Java development at Google.
00:34:37 And honestly, I like C# a lot more.
00:34:40 Microsoft has done a really good job of shepherding that language forward and continuously evolving
00:34:46 it.
00:34:46 Well, I don't think Oracle has done such a great job with Java.
00:34:50 And C# has just done a better job of going forward continuously.