• 0 Posts
  • 22 Comments
Joined 1 year ago
cake
Cake day: June 9th, 2023

help-circle
  • Congrats! I appreciate this post because I want to be where you are in the not too distant future.

    Contributing to Open Source can feel overwhelming, especially if working outside of one’s primary field. Personally, I’m a scientist who got interested in open source via my academic interest in open science (such as the FAIR principles for scientific data management and stewardship, which are that data should be Findable, Accessible, Interoperable and Reusable). This got me interested in how scientists share code, which led me to the horrifying realisation that I was a better programmer than many of my peers (and I was mediocre)

    Studying open source has been useful for seeing how big projects are managed, and I have been meaning to find a way to contribute (because as you show, programming skills aren’t the only way to do that). It’s cool to see posts like yours because it kicks my ass into gear a little.




  • The data are stored, so it’s not a live-feed problem. It is an inordinate amount of data that’s stored though. I don’t actually understand this well enough to explain it well, so I’m going to quote from a book [1]. Apologies for wall of text.

    “Serial femtosecond crystallography [(SFX)] experiments produce mountains of data that require [Free Electron Laser (FEL)] facilities to provide many petabytes of storage space and large compute clusters for timely processing of user data. The route to reach the summit of the data mountain requires peak finding, indexing, integration, refinement, and phasing.” […]

    "The main reason for [steep increase in data volumes] is simple statistics. Systematic rotation of a single crystal allows all the Bragg peaks, required for structure determination, to be swept through and recorded. Serial collection is a rather inefficient way of measuring all these Bragg peak intensities because each snapshot is from a randomly oriented crystal, and there are no systematic relationships between successive crystal orientations. […]

    Consider a game of picking a card from a deck of all 52 cards until all the cards in the deck have been seen. The rotation method could be considered as analogous to picking a card from the top of the deck, looking at it and then throwing it away before picking the next, i.e., sampling without replacement. In this analogy, the faces of the cards represent crystal orientations or Bragg reflections. Only 52 turns are required to see all the cards in this case. Serial collection is akin to randomly picking a card and then putting the card back in the deck before choosing the next card, i.e., sampling with replacement (Fig. 7.1 bottom). How many cards are needed to be drawn before all 52 have been seen? Intuitively, we can see that there is no guarantee that all cards will ever be observed. However, statistically speaking, the expected number of turns to complete the task, c, is given by: where n is the total number of cards. For large n, c converges to n*log(n). That is, for n = 52, it can reasonably be expected that all 52 cards will be observed only after about 236 turns! The problem is further exacerbated because a fraction of the images obtained in an SFX experiment will be blank because the X-ray pulse did not hit a crystal. This fraction varies depending on the sample preparation and delivery methods (see Chaps. 3–5), but is often higher than 60%. The random orientation of crystals and the random picking of this orientation on every measurement represent the primary reasons why SFX data volumes are inherently larger than rotation series data.

    The second reason why SFX data volumes are so high is the high variability of many experimental parameters. [There is some randomness in the X-ray pulses themselves]. There may also be a wide variability in the crystals: their size, shape, crystalline order, and even their crystal structure. In effect, each frame in an SFX experiment is from a completely separate experiment to the others."

    The Realities of Experimental Data” "The aim of hit finding in SFX is to determine whether the snapshot contains Bragg spots or not. All the later processing stages are based on Bragg spots, and so frames which do not contain any of them are useless, at least as far as crystallographic data processing is concerned. Conceptually, hit finding seems trivial. However, in practice it can be challenging.

    “In an ideal case shown in Fig. 7.5a, the peaks are intense and there is no background noise. In this case, even a simple thresholding algorithm can locate the peaks. Unfortunately, real life is not so simple”

    It’s very cool, I wish I knew more about this. A figure I found for approximate data rate is 5GB/s per instrument. I think that’s for the European XFELS.

    Citation: [1]: Yoon, C.H., White, T.A. (2018). Climbing the Data Mountain: Processing of SFX Data. In: Boutet, S., Fromme, P., Hunter, M. (eds) X-ray Free Electron Lasers. Springer, Cham. https://doi.org/10.1007/978-3-030-00551-1_7



  • He doesn’t directly control anything with C++ — it’s just the data processing. The gist of X-ray Crystallography is that we can shoot some X-rays at a crystallised protein, that will scatter the X-rays due to diffraction, then we can take the diffraction pattern formed and do some mathemagic to figure out the electron density of the crystallised protein and from there, work out the protein’s structure

    C++ helps with the mathemagic part of that, especially because by “high throughput”, I mean that the research facility has a particle accelerator that’s over 1km long, which cost multiple billions because it can shoot super bright X-rays at a rate of up to 27,000 per second. It’s the kind of place that’s used by many research groups, and you have to apply for “beam time”. The sample is piped in front of the beam and the result is thousands of diffraction patterns that need to be matched to particular crystals. That’s where the challenge comes in.

    I am probably explaining this badly because it’s pretty cutting edge stuff that’s adjacent to what I know, but I know some of the software used is called CrystFEL. My understanding is that learning C++ was necessary for extending or modifying existing software tools, and for troubleshooting anomalous results.







  • I wonder what would facilitate people to make their own solutions in this way. Like, I have made a few apps or automation things myself, but if I look at my “normie” friends who don’t have the level of tech familiarity that I do, they struggle with whatever out of the box solutions they can find. Poor IT education is a big part of this, and I’ve been wondering a lot about what would need to change for the average “normie” to be empowered to tinker




  • I think people like your father make bank because even though new programmers could learn COBOL, that wouldn’t be enough for them to be able to fulfill the same niche your father and other established COBOL programmers occupy; any programming language has a disparity between “the proper way to do things”, and the kind of kludges you see in the field, but few have the kind of baggage that COBOL does, in terms of how long it’s been around and having things built on top of it.


  • This reads like a poem, I unironically love this

    I am the Rust programmer,
    I will rewrite the world in Rust.
    I will rewrite the world in Rust
    because the world is unsafe.
    As I am the Rust programmer
    I will keep writing rust
    until the world is safe.
    After the world is safe,
    I will not rewrite it in Rust.
    Because I am the Rust programmer
    I will retire from programmer in Rust.

    I will come to you when you are sleeping,
    and I will unlock your computer
    using a memory leak.
    If I find javascript on your computer,
    I will delete them.
    Do not try to stop me,
    if you try to stop me
    I will do it anyways.
    I am the Rust programmer,
    if you program in javascript,
    you will scream.

    You will be sleeping
    as I rewrite your computer in Rust.
    You will not notice me
    as I am the Rust programmer,
    I am fast,
    but not too fast for your computer.
    I know your computer
    just as it knows me.
    After I rewrite your computer,
    you will love your computer.
    You will love your computer
    because it is written in Rust,
    I will do the same to all computers because
    I am the Rust programmer.

    I will not stop at your computer,
    I will rewrite the world
    because the world is unsafe.
    Your brain is written in C,
    your memory is unsafe.
    If your brain is written in C,
    you will forget what I just said.
    I will rewrite your brain in Rust,
    you cannot stop me from writing Rust
    as I am the Rust programmer.
    If you try to stop me,
    you will not remember it.
    Because I am the Rust programmer I can
    manually remove your memory,
    you will not remember me.
    After I rewrite you in Rust,
    you will enjoy the world
    with a safe memory,
    you will not forget
    that I am superior,
    I am the Rust programmer.

    I will rewrite the world,
    I will rewrite quantum mechanics
    because it is unsafe.
    I will not tell you all my plans
    before I rewrite you in Rust,
    It is because you are made of bugs
    I do not trust you.
    I am the Rust programmer,
    I will rewrite the world in Rust,
    you will not forget me
    Because I am the Rust programmer.

    (n.b. I’m bad at scansion, forgive any poor line break choices)


  • Though I wonder if even besides adding an uninterruptible power supply (UPS) (writing acronym out for anyone else who would’ve had to Google it), this might be a useful exercise recovering from outages in general. This is coming from someone who hasn’t actually done any self hosting of my own, but you saying you’re still finding down services reminds me of when I learned the benefit of testing system backups as part of making them.

    I was lucky in that I didn’t have any data loss, but restoring from my backup took a lot more manual work than I’d anticipated, and it came at an awkward time. Since then, my restoring from backup process is way more streamlined.




  • I’m bi, but my appearance is pretty queer coded such that cis-het people tend to read me as “unclear gay or just tech-nerd punk”. I’ve found that when I use the word partner, it can throw people off because they’re clearly fishing for my partner’s gender in a “I can’t tell whether this person is straight or gay” way. Most of the people I’ve dated have been men, but I do like the chaos energy of the confusion