Jump to content

Hamosch

Member
  • Posts

    86
  • Joined

  • Last visited

Everything posted by Hamosch

  1. You just compile the source for another OS, for example for Linux you usually use the GCC/g++ compiler. C/C++ are inherently not locked to any OS, that is introduced by the programmer using OS specific libraries and calls etc...
  2. Well my C++ code is not really fair as I'm using both loop unrolling and the Intel AVX (Advanced Vector Extensions) to get comparably very high throughput by completing 8*4=32 iterations (or adding 64 to the counter) each loop. Loop unrolling increases the throughput because there are not as many branches and with AVX I can exploit the parallelism in the problem to execute 4 instructions in one in a SIMD way. Everyone else uses a serial approach and no unrolling so there's no question to as why my code is faster regardless of being C++. I really just wanted to see what kind of performance I could cram out compared to the naive solution. But have fun, I just wouldn't compare my results of 130 times faster to the others all using the same approach.
  3. I'll do you one better #include <bits/stdc++.h> #include <x86intrin.h> long long t=-1; int main() { std::chrono::seconds time(1); auto start = std::chrono::high_resolution_clock::now(),cur=start; double p[4]; __m256d a = _mm256_setzero_pd(); __m256d k = _mm256_set1_pd(4.0); __m256d b = _mm256_set_pd(1.0, -3.0, 5.0, -7.0); __m256d c = _mm256_set_pd(8.0, -8.0, 8.0, -8.0); while(cur-start<=time) { a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); a = _mm256_add_pd(a, _mm256_div_pd(k, b)); b = _mm256_add_pd(b, c); t+=64; cur=std::chrono::high_resolution_clock::now(); } a = _mm256_hadd_pd(a, _mm256_permute2f128_pd(a, a, 1)); a = _mm256_hadd_pd(a, a); _mm256_store_pd(p, a); std::cout<<p[0]'\n'; std::cout<<t<<'\n'; return 0; } loop unrolled and with vector instrcutions compiled with -O3 and mavx: original python: 6,000,000 optimized c++: 810,000,000 still only running on one core on a stock 4770k.
  4. nope, that would check if the day number is only smaller than 0 thedaynumber <= 0 ---- day number smaller equal zero thedaynumber <= 6 ---- day number smaller equal 6 these together will only be true if the number is smaller equal 0. A good way to remember how the smaller than (<) and larger than (>) signs work is that the sign itself has a big and a small side. The large side of the sign is the number that will be larger then the one on the small side if (large_side > small_side) { ---- if large_side is greater than small_side if (small_side < large_side) { ---- if small_side is smaller than large_side These are the same condition just showing that you can reverse it however you want. The <> are by default non inclusive, adding the = will include the case that both sides are the same. so your condition should be: theDayNumber >= 0 && theDayNumber <=6
  5. All of my C/C++ experience is on *nix systems so I'm not too familiar with windows threading but things like OpenMP and OpenCL are crossplatform so... You could implement a thread pool and use some kind of task based parallelism where you create tasks that are processed by the available threads in some kind of first come first serve fashion yourself. But OpenMP and OpenCL (i think) have implementations for task based parallelism built in out of the box (https://computing.llnl.gov/tutorials/openMP/#Task) which you could take advantage off. Basically the idea is to create tasks consisting of rendering and handling game events and the free threads grab the tasks as they become available. The threads complete the tasks but don't exit, instead they go into a waiting state waiting for new tasks to become available.
  6. Why not just whip up a minimal debian/arch install (or whatever distro you like) and running KVM (+ LXC and/or docker for containers) on it? Completely open source and well documented. You could also use one of the many hypervisor distros that come with KVM and LXC + some web gui ready.
  7. You should make sure you understand what a database is and how you interact with it, @Nuluvius already linked a great resource on the integration in python, but it seems you should also read up on databases and SQL (Structured Query Language) which is the way to interact with the database. There are some frameworks/libraries that abstract database queries into functions, django etc. but you're strongly encouraged knowing some SQL if you want to interact with databases.
  8. It might work, according to ASUS https://www.asus.com/Commercial-Servers-Workstations/Z8NRD12/specifications/ the board supports DDR3, , RDIMM UDIMM, with ECC but nothing can be promised unless you go with stuff on the QVL of your board: http://dlcdnet.asus.com/pub/ASUS/mb/socket1366/Z8NR-D12/Manual&QVL/Z8NR-D12_memory_QVL_20121203.pdf
  9. Yes the reason that Java code is extremely portable is that it compiles into a certain Java instruction set (Java assembly/machine code if you will). This instruction set is not specific to the machine it is run or compiled on but universal to Java. The instruction set is actually a stack based one contrary to the register based one of x86. Some people argue that the reason for choosing a stack based instruction set is that the compiled binary is smaller than the register based counter part, this was an advantage when Java was developed and the internet was still young and its transfer rates where very low. It was easier to share Java programs than other counter parts because the binary was smaller. So back to why it's portable, the java byte code is run in the Java Virtual Machine which at run time translates the java instructions to the underlying hardware instruction set. This means that whenever there is a new hardware platform that has a different instruction set the only thing we need to do is to write a new version of the Java VM that will run on that platform. Now every Java program will run on that platform too, without any recompilation or changes required. THIS IS THE POWER OF JAVA. This is also the reason that many in house cooperate applicatinos are written in Java as they don't need to worry about compatibility when moving from platform to platform. In theory you should be able to write a java VM that could run without a OS but I don't think this has been done or if there is a reason to do so.
  10. It is because the Operating system implements and manages the virtual memory. The operating system manages virtual page tables that map the processes virtual memory addresses to physical memory addresses. Therefore if the operating system is 32 bit it is only going to give programs 32 bits of virtual address space and also can only address 32 bits of physical address space. Therefore even if the CPU can address more than 32 bits of physical address space the page tables never map anything there (because the OS cant address it) and therefore the CPU will only work with 4GB of memory. This is why the fix for this is called PAE (physical address extension) ie the OS can address more than 32 bits of physical addresses.
  11. All programs run from and must access RAM. From where else would it run and what else would it access? And don't say caches, caches are part of the memory hierarchy used to hide the latency of RAM and are implemented in hardware and don't care what language you write in. Also writing cache aware code can be done in any language (ie take advantage of spatial and temporal locality etc..) Not gonna get involved in the rest of the discussion, everyone has a preferred language and they have different usage areas.
  12. OpenVPN (free & open source) https://openvpn.net/index.php/open-source/documentation/howto.html It is available pre-packaged in most linux distros package managers.
  13. Don't really know what kind of answers you expect here as explaining how a phone/computer works with any level of detail requires A LOT of knowledge Here's a crash course on how modern CPUs are designed in broad strokes http://www.lighterra.com/papers/modernmicroprocessors/
  14. btrfs uses the entire harddrive it seems. you need to run: # genfstab -U /mnt >> /mnt/etc/fstab then: # arch-chroot /mnt /bin/bash
  15. yes it copied files, but you're not done you can't remove the usb stick. You still need to set up a lot of stuff but you should be ready to arch-chroot into it after you generate your fstab You should try to follow this guide: You are up to the configuration part: https://wiki.archlinux.org/index.php/beginners'_guide#Configuration
  16. and yes that was the installation process, it just copied a base installation of arch onto your btrfs partition mounted on /mnt
  17. Arch uses pacman, not apt pacman -S xxxxx == apt-get install xxxxxx pacman -Syyu == apt-get update; apt-get upgrade ....
  18. great then your done with the partitioning part and it is mounted do /mnt
  19. if you're going with my suggestion, how much ram do you have, just in case you might need a swap partition...
  20. good point, never used btrfs myself, @Pretzel if you want to use ext4 let me know and I'll keep guiding you through parted, but this might be a good option.
  21. ok that should be bios, perfect # parted /dev/sda (parted) print whats the output of this?
  22. we will cover how to partition it. But, first off, do you have bios or uefi?
×