-
Notifications
You must be signed in to change notification settings - Fork 0
Python package to bring nd-arrays to the GPU
License
EelcoHoogendoorn/ThreadWeave
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <meta name=ProgId content=Word.Document> <meta name=Generator content="Microsoft Word 14"> <meta name=Originator content="Microsoft Word 14"> <link rel=File-List href="ThreadWeave_bestanden/filelist.xml"> <!--[if gte mso 9]><xml> <o:DocumentProperties> <o:Author>Eelco</o:Author> <o:LastAuthor>Eelco</o:LastAuthor> <o:Revision>2</o:Revision> <o:TotalTime>131</o:TotalTime> <o:Created>2012-10-24T23:44:00Z</o:Created> <o:LastSaved>2012-10-24T23:44:00Z</o:LastSaved> <o:Pages>9</o:Pages> <o:Words>3593</o:Words> <o:Characters>20484</o:Characters> <o:Lines>170</o:Lines> <o:Paragraphs>48</o:Paragraphs> <o:CharactersWithSpaces>24029</o:CharactersWithSpaces> <o:Version>14.00</o:Version> </o:DocumentProperties> <o:OfficeDocumentSettings> <o:AllowPNG/> </o:OfficeDocumentSettings> </xml><![endif]--> <link rel=themeData href="ThreadWeave_bestanden/themedata.thmx"> <link rel=colorSchemeMapping href="ThreadWeave_bestanden/colorschememapping.xml"> <!--[if gte mso 9]><xml> <w:WordDocument> <w:TrackMoves>false</w:TrackMoves> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true" DefSemiHidden="true" DefQFormat="false" DefPriority="99" LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false" UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0; mso-font-charset:2; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:0 268435456 0 0 -2147483648 0;} @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0; mso-font-charset:2; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:0 268435456 0 0 -2147483648 0;} @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-536870145 1073743103 0 0 415 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-520092929 1073786111 9 0 415 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin-top:0in; margin-right:0in; margin-bottom:10.0pt; margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} h2 {mso-style-priority:9; mso-style-qformat:yes; mso-style-link:"Kop 2 Char"; mso-style-next:Standaard; margin-top:10.0pt; margin-right:0in; margin-bottom:0in; margin-left:0in; margin-bottom:.0001pt; line-height:115%; mso-pagination:widow-orphan lines-together; page-break-after:avoid; mso-outline-level:2; font-size:13.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#4F81BD; mso-themecolor:accent1;} h3 {mso-style-priority:9; mso-style-qformat:yes; mso-style-link:"Kop 3 Char"; mso-style-next:Standaard; margin-top:10.0pt; margin-right:0in; margin-bottom:0in; margin-left:0in; margin-bottom:.0001pt; line-height:115%; mso-pagination:widow-orphan lines-together; page-break-after:avoid; mso-outline-level:3; font-size:11.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#4F81BD; mso-themecolor:accent1;} p.MsoHeader, li.MsoHeader, div.MsoHeader {mso-style-priority:99; mso-style-link:"Koptekst Char"; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; tab-stops:center 3.25in right 6.5in; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} p.MsoFooter, li.MsoFooter, div.MsoFooter {mso-style-priority:99; mso-style-link:"Voettekst Char"; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; tab-stops:center 3.25in right 6.5in; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} p.MsoTitle, li.MsoTitle, div.MsoTitle {mso-style-priority:10; mso-style-unhide:no; mso-style-qformat:yes; mso-style-link:"Titel Char"; mso-style-next:Standaard; margin-top:0in; margin-right:0in; margin-bottom:15.0pt; margin-left:0in; mso-add-space:auto; mso-pagination:widow-orphan; border:none; mso-border-bottom-alt:solid #4F81BD 1.0pt; mso-border-bottom-themecolor:accent1; padding:0in; mso-padding-alt:0in 0in 4.0pt 0in; font-size:26.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#17365D; mso-themecolor:text2; mso-themeshade:191; letter-spacing:.25pt; mso-font-kerning:14.0pt;} p.MsoTitleCxSpFirst, li.MsoTitleCxSpFirst, div.MsoTitleCxSpFirst {mso-style-priority:10; mso-style-unhide:no; mso-style-qformat:yes; mso-style-link:"Titel Char"; mso-style-next:Standaard; mso-style-type:export-only; margin:0in; margin-bottom:.0001pt; mso-add-space:auto; mso-pagination:widow-orphan; border:none; mso-border-bottom-alt:solid #4F81BD 1.0pt; mso-border-bottom-themecolor:accent1; padding:0in; mso-padding-alt:0in 0in 4.0pt 0in; font-size:26.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#17365D; mso-themecolor:text2; mso-themeshade:191; letter-spacing:.25pt; mso-font-kerning:14.0pt;} p.MsoTitleCxSpMiddle, li.MsoTitleCxSpMiddle, div.MsoTitleCxSpMiddle {mso-style-priority:10; mso-style-unhide:no; mso-style-qformat:yes; mso-style-link:"Titel Char"; mso-style-next:Standaard; mso-style-type:export-only; margin:0in; margin-bottom:.0001pt; mso-add-space:auto; mso-pagination:widow-orphan; border:none; mso-border-bottom-alt:solid #4F81BD 1.0pt; mso-border-bottom-themecolor:accent1; padding:0in; mso-padding-alt:0in 0in 4.0pt 0in; font-size:26.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#17365D; mso-themecolor:text2; mso-themeshade:191; letter-spacing:.25pt; mso-font-kerning:14.0pt;} p.MsoTitleCxSpLast, li.MsoTitleCxSpLast, div.MsoTitleCxSpLast {mso-style-priority:10; mso-style-unhide:no; mso-style-qformat:yes; mso-style-link:"Titel Char"; mso-style-next:Standaard; mso-style-type:export-only; margin-top:0in; margin-right:0in; margin-bottom:15.0pt; margin-left:0in; mso-add-space:auto; mso-pagination:widow-orphan; border:none; mso-border-bottom-alt:solid #4F81BD 1.0pt; mso-border-bottom-themecolor:accent1; padding:0in; mso-padding-alt:0in 0in 4.0pt 0in; font-size:26.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#17365D; mso-themecolor:text2; mso-themeshade:191; letter-spacing:.25pt; mso-font-kerning:14.0pt;} p.MsoNoSpacing, li.MsoNoSpacing, div.MsoNoSpacing {mso-style-priority:1; mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; margin-top:0in; margin-right:0in; margin-bottom:10.0pt; margin-left:.5in; mso-add-space:auto; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; mso-style-type:export-only; margin-top:0in; margin-right:0in; margin-bottom:0in; margin-left:.5in; margin-bottom:.0001pt; mso-add-space:auto; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; mso-style-type:export-only; margin-top:0in; margin-right:0in; margin-bottom:0in; margin-left:.5in; margin-bottom:.0001pt; mso-add-space:auto; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; mso-style-type:export-only; margin-top:0in; margin-right:0in; margin-bottom:10.0pt; margin-left:.5in; mso-add-space:auto; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} span.Kop2Char {mso-style-name:"Kop 2 Char"; mso-style-priority:9; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:"Kop 2"; mso-ansi-font-size:13.0pt; mso-bidi-font-size:13.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#4F81BD; mso-themecolor:accent1; font-weight:bold;} span.Kop3Char {mso-style-name:"Kop 3 Char"; mso-style-priority:9; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:"Kop 3"; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#4F81BD; mso-themecolor:accent1; font-weight:bold;} span.TitelChar {mso-style-name:"Titel Char"; mso-style-priority:10; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:Titel; mso-ansi-font-size:26.0pt; mso-bidi-font-size:26.0pt; font-family:"Cambria","serif"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; color:#17365D; mso-themecolor:text2; mso-themeshade:191; letter-spacing:.25pt; mso-font-kerning:14.0pt;} span.KoptekstChar {mso-style-name:"Koptekst Char"; mso-style-priority:99; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:Koptekst; font-family:"Times New Roman","serif"; mso-bidi-font-family:"Times New Roman";} span.VoettekstChar {mso-style-name:"Voettekst Char"; mso-style-priority:99; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:Voettekst; font-family:"Times New Roman","serif"; mso-bidi-font-family:"Times New Roman";} span.VoettekstChar1 {mso-style-name:"Voettekst Char1"; mso-style-noshow:yes; mso-style-priority:99; mso-style-unhide:no; font-family:"Times New Roman","serif"; mso-bidi-font-family:"Times New Roman";} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Calibri; mso-bidi-theme-font:minor-latin;} .MsoPapDefault {mso-style-type:export-only; margin-bottom:10.0pt; line-height:115%;} /* Page Definitions */ @page {mso-footnote-separator:url("ThreadWeave_bestanden/header.htm") fs; mso-footnote-continuation-separator:url("ThreadWeave_bestanden/header.htm") fcs; mso-endnote-separator:url("ThreadWeave_bestanden/header.htm") es; mso-endnote-continuation-separator:url("ThreadWeave_bestanden/header.htm") ecs;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} /* List Definitions */ @list l0 {mso-list-id:814688723; mso-list-type:hybrid; mso-list-template-ids:1895869328 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;} @list l0:level1 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Symbol;} @list l0:level2 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:"Courier New"; mso-bidi-font-family:"Times New Roman";} @list l0:level3 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Wingdings;} @list l0:level4 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Symbol;} @list l0:level5 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:"Courier New"; mso-bidi-font-family:"Times New Roman";} @list l0:level6 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Wingdings;} @list l0:level7 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Symbol;} @list l0:level8 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:"Courier New"; mso-bidi-font-family:"Times New Roman";} @list l0:level9 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Wingdings;} @list l1 {mso-list-id:1059087720; mso-list-type:hybrid; mso-list-template-ids:-1340154646 -1245945194 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;} @list l1:level1 {mso-level-start-at:0; mso-level-number-format:bullet; mso-level-text:-; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:20.65pt; text-indent:-.25in; font-family:"Calibri","sans-serif"; mso-fareast-font-family:"Times New Roman"; mso-bidi-font-family:"Times New Roman";} @list l1:level2 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:56.65pt; text-indent:-.25in; font-family:"Courier New"; mso-bidi-font-family:"Times New Roman";} @list l1:level3 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:92.65pt; text-indent:-.25in; font-family:Wingdings;} @list l1:level4 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:128.65pt; text-indent:-.25in; font-family:Symbol;} @list l1:level5 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:164.65pt; text-indent:-.25in; font-family:"Courier New"; mso-bidi-font-family:"Times New Roman";} @list l1:level6 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:200.65pt; text-indent:-.25in; font-family:Wingdings;} @list l1:level7 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:236.65pt; text-indent:-.25in; font-family:Symbol;} @list l1:level8 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:272.65pt; text-indent:-.25in; font-family:"Courier New"; mso-bidi-font-family:"Times New Roman";} @list l1:level9 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:308.65pt; text-indent:-.25in; font-family:Wingdings;} ol {margin-bottom:0in;} ul {margin-bottom:0in;} --> </style> <!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:Standaardtabel; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Calibri; mso-bidi-theme-font:minor-latin;} table.MsoTableGrid {mso-style-name:Tabelraster; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-priority:59; mso-style-unhide:no; border:solid windowtext 1.0pt; mso-border-alt:solid windowtext .5pt; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-border-insideh:.5pt solid windowtext; mso-border-insidev:.5pt solid windowtext; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";} </style> <![endif]--><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026"/> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1"/> </o:shapelayout></xml><![endif]--> </head> <body lang=EN-US style='tab-interval:.5in'> <div class=WordSection1> <div style='mso-element:para-border-div;border:none;border-bottom:solid #4F81BD 1.0pt; mso-border-bottom-themecolor:accent1;padding:0in 0in 4.0pt 0in'> <p class=MsoTitle>ThreadWeave </p> </div> <h2>Introduction</h2> <p class=MsoNormal>In a nutshell, ThreadWeave aims to bring the type of functionality found in scipy.weave to the GPU, or SIMD architectures, more generally. That is, ThreadWeave allows one to write concise inline SIMD kernels directly in Python, while providing both complete imperative backwards compatibility, as well as a useful set of higher level abstractions.</p> <p class=MsoNormal>Most significantly, nd-arrays are first class citizens of the ThreadWeave world. They can be passed around, indexed, have their attributes read, and have boundary handling modes applied to them, all within the kernel.</p> <p class=MsoNormal>An example says more than a thousand words, so here are two short ones:</p> <table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0 style='border-collapse:collapse;border:none;mso-border-alt:solid windowtext .5pt; mso-yfti-tbllook:1184;mso-padding-alt:5.75pt 5.75pt 5.75pt 5.75pt'> <tr style='mso-yfti-irow:0;mso-yfti-firstrow:yes;mso-yfti-lastrow:yes'> <td width=1277 style='width:6.65in;border:solid windowtext 1.0pt;mso-border-alt: solid windowtext .5pt;padding:5.75pt 5.75pt 5.75pt 5.75pt'> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'>import numpy as np<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'>from threadweave.backend import OpenCL as Backend<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'>from threadweave.stencil import laplacian<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><o:p> </o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'>with Backend.Context(device = 0) as ctx:<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><o:p> </o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>laplacian_kernel = ctx.kernel("""<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>(<type>[d,n,m] output) << [d,n,m] << (<type>[d,n,m] input):<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>n:<span style='mso-spacerun:yes'> </span>serial<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>d:<span style='mso-spacerun:yes'> </span>variable<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>input:<span style='mso-spacerun:yes'> </span>padded(stencil)<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>{<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span><type> r = 0;<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>for (s in stencil(input))<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>r += input(s.d,s.n,s.m) * s.weight;<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>output(d,n,m) = r;<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>}""")<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><o:p> </o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>shape = (2,6,6)<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>input = ctx.array(np.arange(np.prod(shape)).astype(np.float32).reshape(shape) ** 2)<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>print input<span style='mso-spacerun:yes'> </span>#some arbitrary input data<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><o:p> </o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>stencil = laplacian(2,5)[np.newaxis,:,:] #construct 2-dim 5-pts laplacian, with broadcasting<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>output = laplacian_kernel(input, stencil = stencil)<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>print output</span></p> </td> </tr> </table> <p class=MsoNormal><o:p> </o:p></p> <p class=MsoNormal>Without going into the details; this example shows the declaration and invocation of a Laplacian filter applied to a stack of 2d arrays; these few lines of code are converted into 70 lines of dense CUDA or OpenCL boilerplate. Of course good libraries exist already to perform convolutions; the added value, aside from demonstration purposes, is that ThreadWeave allows for greatly expanded flexibility. By acting on a stack of 2d arrays, we can expose additional parallelism, and we have full control over boundary handling, as well as the way in which axes are iterated over. Also, the iteration over the stencil is transparently unrolled.</p> <p class=MsoNormal>Aside from eliminating boilerplate from imperative C code, the functionality at the core of ThreadWeave also makes for a good platform for building higher level abstractions:</p> <table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0 style='border-collapse:collapse;border:none;mso-border-alt:solid windowtext .5pt; mso-yfti-tbllook:1184;mso-padding-alt:5.75pt 5.75pt 5.75pt 5.75pt'> <tr style='mso-yfti-irow:0;mso-yfti-firstrow:yes;mso-yfti-lastrow:yes'> <td width=1277 style='width:6.65in;border:solid windowtext 1.0pt;mso-border-alt: solid windowtext .5pt;padding:5.75pt 5.75pt 5.75pt 5.75pt'> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'>import numpy as np<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'>from threadweave.backend import CUDA as Backend<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><o:p> </o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'>with Backend.Context(device = 0) as ctx:<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>#some example data<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>shape = 2,3,4<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>A = ctx.arange(np.prod(shape),np.float32).reshape(shape)<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>shape = 2,4,3<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>B = ctx.arange(np.prod(shape),np.float32).reshape(shape)<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><o:p> </o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>#declare product, contracted over j<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>stacked_matrix_product = ctx.tensor_product('nij,njk->nik')<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>C = stacked_matrix_product(A, B)<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><o:p> </o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>print C<o:p></o:p></span></p> <p class=MsoNoSpacing><span style='font-size:8.0pt;font-family:"Courier New"'><span style='mso-spacerun:yes'> </span>print ctx.tensor_product('nii->ni')(C)<span style='mso-spacerun:yes'> </span>#print the diagonals of C too, for good measure<o:p></o:p></span></p> </td> </tr> </table> <p class=MsoNormal><o:p> </o:p></p> <p class=MsoNormal>This short example declares and invokes a contracting tensor product; more specifically, it multiplies a stack of matrices. The function ctx.tensor_product is a work-alike of the invaluable numpy.einsum, and can be used to concisely express inner products, outer products, diagonal extractions, transposes, and any other operation which can be expressed as a tensor product with Einstein summation convention.</p> <p class=MsoNormal>By building on top of the abstractions at the core of ThreadWeave, the functionality in ctx. tensor_product can be implemented in just a handful of lines. (in its current fully functional yet suboptimal form, that is). </p> <p class=MsoNormal>To give an impression of how this happens: both tensor_product and laplacian_kernel have the same type; KernelFactory, which wraps a KernelDeclaration and a cache of compiled KernelInstances, which are created by letting a backend specific CodeGenerator act on a KernelDeclaration for a given set of template arguments. </p> <p class=MsoNormal>You get the idea; quite a few of the same things going on behind the scenes, which are identical for both examples. The methods Context.prod and Context.kernel merely wrap two different front-ends for parsing their respective grammars into a KernelDeclaration; many more can be created with relatively little effort to cater to specific wants. This frees one from grave discussions, with oneself or others, over syntactical details; if your problem domain asks for different syntax, it should be easy to implement on top of ThreadWeave.</p> <p class=MsoNormal><o:p> </o:p></p> <h2>Feature summary</h2> <p class=MsoNormal>Here is a concise listing of the core features that ThreadWeave provides at present.</p> <p class=MsoListParagraphCxSpFirst style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Nd-array awareness:</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Passing nd-arrays in and out of kernels</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Indexing of nd-arrays</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Various boundary handling modes (padded, wrapped, clamped, fixed)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Accessing of array attributes such as shape with numpythonic syntax.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Axes:</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Declaratively relate kernel and array axes to one another. This gives a highly flexible and compact notation.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Decoupling between the grid of work items to be performed, and the grid of threads that perform them. This allows us to overcome hardware limitations on threadgrids, but more importantly, allows one to declaratively optimize the relation between the work to be performed, and the arrangement of threads to accomplish this goal.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Runtime features:</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>JIT compilation of kernels templated on shape, type and stencil arguments</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Runtime type & shape validation of kernel arguments</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Output shape inference and allocation (if not given to the kernel)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:56.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level2 lfo1'><![if !supportLists]><span style='font-family:"Courier New";mso-fareast-font-family:"Courier New"'><span style='mso-list:Ignore'>o<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Computation of sensible defaults for grid/block parameters</p> <p class=MsoListParagraphCxSpLast style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Abstracting away the differences between different backends; they can be switched out by changing a single import. As of now, CUDA and OpenCL are supported. This applies to both the Python side of things, where ThreadPy presents a uniform interface, as well as papering over the superficial syntactical differences between the two C-dialects, like atomicAdd / atomic_add. </p> <p class=MsoNormal><o:p> </o:p></p> <h2>ThreadWeave Concepts</h2> <h3>Nd-arrays</h3> <p class=MsoNormal>The two primary SIMD languages, CUDA and OpenCL, are both C dialects, which is the natural choice in many ways. However, C was never designed with SIMD in mind; and although both dialects offer some abstractions for dealing with multiple threads, one is left with the minimal array abstractions C offers for the multiple data part of the equation. By extending C with nd-array capabilities, we not only gain the immediate benefits of this tried and true abstraction, but it also enables the introduction of novel concepts, aimed at harnassing the typically tight relationship between the grid of threads and the data they act on.</p> <p class=MsoNormal>ThreadWeave facilitates working with nd-arrays both within the kernel code, as well as during declaration and invocation.</p> <p class=MsoNormal>In order to provide access to arrays within the kernel, a pointer to the device memory is passed into the kernel (shocker). In addition, all shape information is either passed into the kernel at compile time or runtime. This information is then used to compute all strides, which in turn is used to provide clean syntax for indexation of nd-arrays. All the array properties passed into or computed within the kernel are accessible within the kernel by means of numpythonic syntax.</p> <p class=MsoNormal>Each array axis can be given a boundary handling mode. By default, array access is not bounds checked. But when working with stencil operations for instance, one needs a consistent way of handling array boundaries. All typical boundary handling modes are planned functionality (clamped, fixed, reflected); for now, ThreadWeave only supports wrapped boundaries, and padded boundaries. By marking an array as padded (with a given stencil), it is treated as a view upon a larger array (with enough padding such that all access through the stencil is safe). This is both a very flexible way of specifying boundary conditions, and it is the most efficient in terms of overhead inside the kernel.</p> <p class=MsoNormal>At the moment, only C-contiguous arrays are supported (plus a restricted notion of views on these, in the form of padded arrays). Support for arbitrary strides would be nice to have, but the nd-array classes included in the current stable branch of pycuda and pyopencl do not support such. This functionality is however around the corner, in the form on the package compyte, which provides a closer analogue of numpy�s ndarray, with a shared code base between pycuda and pyopencl. It�s probably best to wait for the public release of compyte and its integration into pycuda/pyopencl rather than reinventing that wheel.</p> <h3>Kernels and axes</h3> <p class=MsoNormal>In ThreadWeave, kernels are declared as having an arbitrary number of axes, with these axes spanning a grid of repetitive work items, which all have the same kernel body applied to them. As opposed to the typical SIMD model, not all these work items are necessarily performed in parallel; many kernels benefit from having a thread act serially along an axis, such as to minimize thread launch overhead, and maximize reuse of computed terms which are constant between work items. This is opposed to the CUDA/OpenCL model, where the kernel grid has a one to one mapping to a grid of physical threads.</p> <p class=MsoNormal>An axis can have its iteration type set to serial, parallel, or hybrid iteration, with the latter meaning the work items along the axes are divided among a number of threads, each of which having responsibility for looping over a part of the axis, and the other two iteration modes denoting the logical extremes of that scenario. By being able to declaratively change an axis� iteration behavior, one can rapidly find a balance between the benefits of exposing parallelism and its drawbacks, without rewriting a single line of kernel body code.</p> <p class=MsoNormal>Within a kernel declaration, we can create size relations between kernel axes and array axes. This allows the kernel shape and output arguments� shape to be deduced from the input arguments� shape. In typical use, most array axes are directly related to a kernel axis in this manner.</p> <h3>Stencil operations</h3> <p class=MsoNormal>Stencil operations are a staple of GPU computing, and as such, ThreadWeave attempts to accommodate them. Most importantly, ThreadWeave supports special syntax for iterating over (unrolled) stencils. The syntax is �for (voxel_id in stencil_id)�, where any properties of the individual voxels of a stencil can be accessed through the loop dummy variable, as in voxel_id.weight, for instance.</p> <p class=MsoNormal>For a comprehensive hands on tutorial of stencil operations in ThreadWeave, see the watershed segmentation unittest in the source.</p> <h3>Context objects</h3> <p class=MsoNormal>ThreadWeave adds a thin wrapper around native pycuda/pyopencl context objects. This is useful for abstracting away some of the superficial differences between the two, as well as hiding some of their rather C-like characteristics behind a more pythonic interface. This context object aims to be a work-alike of the numpy namespace, providing a comprehensive toolbox of functionality in a familiar interface. For more information, see ThreadPy.</p> <p class=MsoNormal><o:p> </o:p></p> <h2>Frontend Syntax</h2> <p class=MsoNormal>ThreadWeave at present supports several front-end syntaxes for accessing its functionality; a general imperative syntax, which allows for the most flexibility, and two more derived ways of declaring stencils, for pure elementwise kernels and tensor products.</p> <h3>General imperative syntax</h3> <p class=MsoNormal>The imperative syntax is largely familiar C code, with some extensions. The most visible difference is in the signature of the kernel declaration. Fully writing out all details of this syntax is more cluttered than just reading the relevant parser definitions in the source (which is quite concise and readable, thanks to pyparsing), and arguably the syntax is most easily grasped by looking at the examples, but nonetheless, a coarse description of the syntax is given here.</p> <p class=MsoNormal>The global structure of the syntax is thus: </p> <p class=MsoNormal><span style='font-size:8.0pt;line-height:115%;font-family: "Courier New"'>Colon_terminated_signature_line:<o:p></o:p></span></p> <p class=MsoNormal style='text-indent:.5in'><span style='font-size:8.0pt; line-height:115%;font-family:"Courier New"'>Indented_annotations_block<o:p></o:p></span></p> <p class=MsoNormal><span style='font-size:8.0pt;line-height:115%;font-family: "Courier New"'>Curly_braced_body<o:p></o:p></span></p> <p class=MsoNormal>The signature has the following grammar:</p> <p class=MsoNormal><span style='font-family:"Courier New"'>(output arguments) << [kernel axes] << (input arguments)<o:p></o:p></span></p> <p class=MsoNormal>Which is supposed to be read as �the input arguments mapped over a kernel of the given shape yield the output arguments�.</p> <p class=MsoNormal>Kernel axis identifiers should be a single lowercase character. Constraints on the size of the axis can be written inline with their declaration.</p> <p class=MsoNormal>Arguments have the form <span style='font-family:"Courier New"'>typename[array axes] array_identifier</span>, with an optional assignment of default values for output arguments (constants value or copy of input argument).</p> <p class=MsoNormal>The array and axes identifiers declared in the signature can be further specified with annotations in the proceeding annotations section.</p> <p class=MsoNormal>For axes, we can specify an iteration mode (serial, parallel, hybrid), and binding behavior (compile-time or run-time). By default, templated size arguments bind at compile time; this leads to less arguments to pass in, and enables more compiler optimizations. However, while kernels are typically called often with the same size parameters, this need not be the case, and template arguments can be annotated as variable, to defer binding of their size value to runtime, to avoid excessive compilation overhead.</p> <p class=MsoNormal>For arrays, the only possible annotation is in specifying a boundary handling method for now, but more features are planned, such the ability to specify caching behavior.</p> <p class=MsoNormal>The syntax was designed so that the signature prominently displays the important information, being the public interface of the kernel, as well as lucidly summarizing the relationship between the nd-kernel and the nd-arrays it acts upon. The annotation section defines additional properties of axes and arrays declared on the signature line. It would be possible to allow such annotations to be written inline with the signature, but considering the number of possible (future) keywords, this could get rather cluttered; plus these annotations are at most of secondary semantic significance; more about the how than the what.</p> <p class=MsoNormal>The last section is the kernel body. Any valid C code is valid body code, but in addition, a set of preprocessing transformations are applied. The recognized constructs are:</p> <p class=MsoListParagraphCxSpFirst style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Array indexing <span style='mso-tab-count:1'> </span>(round brackets, to avoid conflicts with native C indexing)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Array properties <span style='mso-tab-count: 1'> </span>(.shape[], .strides[], .size, andsoforth)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Stencil iteration <span style='mso-tab-count: 1'> </span>(�for voxel_id in stencil_id) {}�)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Voxel properties<span style='mso-tab-count:1'> </span>(voxel_id.axis_id, voxel_id.weight)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Stencil properties <span style='mso-tab-count: 1'> </span>(stencil_id.size <- gives number of active (non-masked) voxels)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Integer range iteration (�for (dummy in 0:array_id.shape[1])�)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Numpythonic type names<span style='mso-tab-count: 1'> </span>(�uint64� just looks so much better than �unsigned long long�)</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Template types<span style='mso-tab-count:1'> </span>(angle bracketed template identifiers. To be used in both body and signature)</p> <p class=MsoListParagraphCxSpLast style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Backend-specific translations<span style='mso-tab-count:1'> </span>(atomic operations, synchronization statements)</p> <h3>Elementwise kernels</h3> <p class=MsoNormal>This provides an nd-array aware, and thus broadcastable equivalent of pycuda/pyopencl elementwise kernels. Basic usage is shown in the examples folder. </p> <h3>Tensor products</h3> <p class=MsoNormal>The syntax and use is entirely similar to numpy.einsum, aside from the fact that ThreadWeave employs a separate declaration and invocation stage. The syntax is likely to be enriched in the future, for instance to cleanly specify axes size hints or iteration modes. So far this function is more a demonstration of concept than something that is efficient in many circumstances; though non-contracting products should work well enough. Again, basic usage is demonstrated in the examples folder.</p> <p class=MsoNormal><o:p> </o:p></p> <h2>Software design</h2> <p class=MsoNormal>Here is a short conceptual description of the classes in ThreadWeave</p> <p class=MsoNormal>Class overview:</p> <p class=MsoListParagraphCxSpFirst style='text-indent:-.25in;mso-list:l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family: Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>KernelDeclaration: Contains all information that defines the kernel. It provides a rich set of accessor properties, such that other pieces of code that depend on it, like the CodeGenerator and runtime components, can be written in a readable and safe manner. Also, it presents an interface for building declaration objects in a concise and consistent manner; see parsing_tensor.py and parsing_declaration.py for an example of use. <br> Building KernelDeclarations in end-user code is entirely possible as well, but so far the power of the front-end declaration language has been able to keep pace with the features in the KernelDeclaration, so there should not be any real use for this.</p> <p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family: Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>KernelFactory: The public Kernel interface one is dealing with in typical use. It transparently handles JIT compilation of its associated KernelDeclaration, based on inferred or explicitly specified template arguments.</p> <p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family: Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>KernelInstance: Kernel object compiled against a complete set of template arguments. It provides methods for runtime argument checking, collection of required runtime arguments, and automatic thread/block assignment.</p> <p class=MsoListParagraphCxSpLast style='text-indent:-.25in;mso-list:l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family: Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>CodeGenerator: class responsible for translating a KernelDeclaration into valid backend-specific code. Most lines of code are in a general C_Code_Generator baseclass; the parts which are backend specific are delegated to subclasses.</p> <p class=MsoNormal><o:p> </o:p></p> <h2>Relation to other projects</h2> <p class=MsoNormal>In terms of overall intent and approach, ThreadWeave is most similar to scipy.weave, which is where it derives its name from. ThreadWeave is built as a layer on top of pycuda and pyopencl; infact, ThreadWeave was inspired by the elementwise function from these packages, which is similar in intent, if much more limited in functionality.</p> <p class=MsoNormal>There exist many python libraries for facilitating SIMD programming with higher level concepts; notable ones are Loo.py, Theano, copperhead (seems to be inactive), and CUDA Thrust (as used within pycuda).</p> <p class=MsoNormal>They all share an intention to bring functional programming techniques to the GPU; which is a sensible aim, given the highly formulaic nature of much C SIMD code. However, not every SIMD programming problem can be naturally expressed in a functional framework, and often some hacky C code is the more natural solution, especially considering the performance oriented nature of most applications. Most of the repetitive nature of kernel code can be eliminated with a simple C preprocessor, which is the approach ThreadWeave takes. If you cannot fit your problem into one of ThreadWeave abstractions, that does not mean it is back to square one of plain old C; you simply handcode that part, while still enjoying its other conveniences. This is the niche that ThreadWeave aims to fill.</p> <p class=MsoNormal>ThreadWeave aspires to expand its higher level and more functional layer in the future. It is my hope that ThreadWeave and loo.py become integrated. They both have their own application domains, but they also share a lot of functionality and intent. Ideally, loo.py could be built on top of ThreadWeave, relying on it for backend independence, front end syntax, nd-array handling and other basics like JIT-compilation of template arguments, while bringing its excellent functional programming capabilities to a wider audience. </p> <p class=MsoNormal>Theano in turn, (another python package much beloved by me) might in an ideal world be built on top of such a software stack. At present the Theano development team has to maintain its own GPU nd-array kernel code, while without a doubt, that is a cause worthy of its own development team. Whether it comes to the point where this package will be general and robust enough to serve such needs, I do not know, but this is certainly the direction that I believe this project should take.</p> <h2><o:p> </o:p></h2> <h2>Planned functionality</h2> <p class=MsoNormal>This project has so far been a one man effort. It has gone through several rewrites from the ground up, and I would not be surprised if it went through a few more, after going public and receiving feedback. At present, ThreadWeave does what I set out for it to do, but as it has become a bit of a goal in itself, I expect to be actively developing it in the future. </p> <p class=MsoNormal>Here is a short list of the functionality I feel is currently missing, and which I might add in the near future, or would encourage others to help with; but I am sure many obvious features are missing, and your input is much appreciated:</p> <p class=MsoListParagraphCxSpFirst style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>A natural way to include auxiliary functions, or other pieces of C code. Of course one can just append anything desirable to the source string, but it would be nice to have the ability to pass arrays around, and so forth. </p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Support for arbitrarily strided arrays. This should be a breeze to add, once the latest compyte nd-array is merged with pycuda and pyopencl. Also, if such is supported, it would be good to be able to place declarative constraints on the ordering of arrays; either to ensure an input argument has the expected layout, and cast or raise otherwise, or to allocate an output array in a specific manner. Control over strides is important, considering their implications for memory coalescing.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Support for various array caching schemes. Two very common cases come to mind where shared memory makes the difference, and which should not be too hard to generalize and abstract away: a front of threads iterating through an array to perform a stencil operation, and matrix multiplication or transpose type operations. Also, small and frequently accessed arrays may be hoisted into shared mem entirely, to take pressure off the cache.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Support for arrays of structs, and structs of arrays. That includes complex arrays of both kinds. It would be nice if both were supported elegantly on both the python and kernel side of things; ie, automatic struct declaration generation from the python typedef, and structs of arrays being passable into the function as a single entity. Perhaps it is a bit too voodoo, but there is something to be said for translating array_id[index].field to array_id_field[index] if array_id is a struct of arrays, so memory layout can be switched declaratively, without changing a single line of body code, analogous to the way strides abstract away memory layout.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>A rich way of specifying symbolic constraints between axes sizes. At present, only equal and constant size constraints are fully supported. But it would be nice to support arbitrary arithmetic relations and inequalities between the array and kernel axes; for instance, when tiling an array, the output dimensions are N times the input dimensions; when differentiating along an axis, that dimension shrinks by one; and so forth (the latter can actually be expressed as a stencil operation, size constraints and all). One could use sympy to solve these symbolic relations at declaration time for the missing quantities, and derive explicit expressions that can efficiently be evaluated by the runtime components.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Improvements to grid and block allocations. The current implementation works, but is not particularly intelligent. Considering how problem dependent this is, there should probably be the option to choose from different strategies during kernel declaration, and provide a manual override as well.</p> <p class=MsoListParagraphCxSpMiddle style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Provide virtual parallel axes. CUDA supports only up to three parallel axes (or two for old hardware). It has not been a practical concern yet, but there should not be a limit to their number. If there are more parallel axes than hardware axes, multiple axes should be packed into a single hardware axis. Similarly; the length of any particular axis should not be limited by hardware constraints, and should be virtualized where necessary.</p> <p class=MsoListParagraphCxSpLast style='margin-left:20.65pt;mso-add-space: auto;text-indent:-.25in;mso-list:l1 level1 lfo1'><![if !supportLists]><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri'><span style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]>Provide syntax to explicitly structure code involving serial axes iteration. By default, the whole kernel is the loop body, but we may wish to place code before, in, or after any given axis loop. Something like �axis(axis_identifier) {body}� should do the trick, where this construct is translated into the actual for loop over the axis, or is simply taken out in case of a parallel axis.<br> It is probably more typical to wish to structure code relative to the whole of all serial axes iteration though (declaring a dummy, serially reducing, and then atomically reducing an array comes to mind). Just �for {body}� or something like it should work for that. Both syntaxes could coexist.</p> <h2>Acknowledgement</h2> <p class=MsoNormal>Andreas Kloenckner, of pycuda and pyopencl fame, which makes all of this possible.</p> <p class=MsoNormal>Paul McGuire, for the awesome pyparsing, without which starting a project of this kind would never have crossed my mind.</p> <p class=MsoNormal>And the terrific python community as a whole, whom I hope to repay a small part of my debt to.</p> <h2>Terms of use</h2> <p class=MsoNormal>To do with as you please. Acknowledgement or citations, where applicable, are appreciated. Ideas and suggestions for improvements are always welcome, as are actual improvements themselves.</p> </div> </body> </html>
About
Python package to bring nd-arrays to the GPU
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published