If I designed the schema it is most certainly going to be structured. Unstructured databases are awful.
If I had a cookie for every time someone asked me if I could pull together some obscure metric during a 30 minute meeting I’d have the beetus
Same. Especially since I’ve been building EDWs for most of my career. People are always surprised that it actually takes time to integrate with different systems.
“What do you mean you can’t just pull all the data out of this system that we don’t have database access to and are still building out the APIs?”
I kid… The people asking for stuff don’t know what backend databases and APIs are.
Just hit the magic ETL lever, what’s the big deal?
I mean without documentation… what is an API?
Maybe this makes me old, but I much prefer a written document explaining how an API works over Swagger.
I prefer a hybrid approach. A document explaining some common things to do and generally the idea behind why the API is structured that way (shows me you actually thought about it, and makes it more logical to find different parts of it without necessarily looking it up), and then an API spec showing all the parameters.
I mean, if you’re really good at SQL these requests are doable in 10-30m + the time it takes to run and export.
It’s not just being good at SQL, it’s knowing the data and relationships therein.
Even then, 5 “quick” requests takes up most of your day
Yep, being familiar with the data model is 98% of the effort.
The remaining 2% is the query
The fun comes when there is no actual data model. All in all, I’d say being familiar with the data model is about 60% of my job. 35% is building queries and query scripts for people who need regular exports. 5% is running after other people’s fuckups.
Strap in, because this is a ride.
There is a raw database from a decade-and-a-half old app, which I get to access through a layer of views that does some joining, but not all, with absolutely no documentation on how the original database is structured or where things are pulled from or what anything refers to. No data dictionary, no list or map of key relations, some objects are mapped in two different views, no semantic naming of columns.
If you want to want to query order part delegations by who they’re assigned to (Recipient in the app) you need to use the foreign key
RefAssignmentUnit
. The “Assignment” unit that did the delegation is justRefUnit
. If you have orders that were created by a salesperson on behalf of a customer,OrderingPerson
(also a foreign key, but not named Ref-) is the customer, whileOrderingPerson2
is the salesperson that entered the order. Don’t confuse that withCreator
, which for orders created through the web form is usually a technical user, unless the salesperson is one of the veterans that use the direct app in which case it’ll be the salesperson whileOrderingPerson2
is null.Also, we have many-to-many relationships that are mapped through reference tables… whose columns are named
object
andreference
for each and every one. Have fun trying to memorize which refers to which so you don’t need to look it up every damn time.Create my own views to clean this up? Nope, only the third party service providers for the app can do that, and they don’t wanna. Our internal app admin (singular) can use some awkward tool to generate those views, but there’s no reverse lookup to see what a given column refers to. Also, they have no concept for what actually constitutes a good model because they’re not really familiar with the database, just with the app.
Get my own serverless DB to create views that query the original DB? No can do, you’d need to order a whole server and that’s pricy.
Get a cloud DB? Sure, but it will be managed by the cloud team and if you want to have or edit custom views, you’ll get to create a project request. They’ll put it in the backlog and work it into some future sprint.Get literally any tool that allows me to efficiently create reusable data prep so I don’t have to copy & paste the base transformations needed for a given query every fucking time and if the source DB ever changes I need to update all my query scripts? If you can somehow squeeze the time to prepare a convincing pitch - a full Power Point presentation, of course - between all your tedious and redundant query preparation and script maintenance, find a management sponsor willing to hear you out and hopefully propose your request to their superiors. Best case: It becomes a whole project - alternatives will have to be considered first, implications, security, costs, and you’ll be the one having to assemble and present that information to management only to have some responsible person point out that it would actually be the remit of a different team… that also works in sprints, has a backlog and will give you no control over your prep.
And obviously, the app provider doesn’t give us any advance notice of just what will change in the DB with the next update. We only learn that when a view breaks. The app admin can use the tool to refresh the affected views then, while I scramble to determine all the scripts that need to be updated and copy&paste the fix. If a user has been granted their own access to the database, odds are they’ll come crying to me when their modified versions of my queries break.
There is a lot I like about my job, I acknowledge the difficulties of a historically grown system and service contracts, but the rigid and antiquated corporate culture can go take a long walk off a short pier.
…you have my condolences
Thank you. Funny enough, just today I chatted with a colleague that mentioned one of the tools was technically available to us, but actually not approved for use. Ordering it wouldn’t be an issue, but getting it signed off would be quite the chore.
And then managers go “why does shadow IT exist?”
“some ideal clean pristine table” sounds like the kind of sass I’d expect from a teenager about to convince me that it’s somebody else’s job to clean their room and that his database is something that Doesn’t Need Love and is Happy, Actually to be treated like a storage dump pumped full and forgotten until the boys arrive and it’s time to mock the signs of neglect in front of them
or
identify imperative with the masculine and declarative with the feminine and cultivate balance
Some guy opened like 10 jira tickets asking a bunch of data that is already available for them on the BI. I show it how to extract it himself, and then marked all the other ones as duplicated and for a moment I felt myself as an stackoverflow mod.
I don’t think that’s how the where clause works
Just chuck an =1 at the end, bam
Yeah, that’s more appropriate for a WHEN clause or possibly even a WHOM
I’m still waiting for the WHY clause
There’s a WHOM keyword in SQL?
Hopefully they’ll one day allow WHENCE as an alias of FROM
The original didn’t have the where clause.
Even when you give them the tools to pull the data themselves like Tableau they’ll still ask you to pull the data for them.
Well, can you just give me access to the database then?
*Acess granted to the homologation DB that haven’t been fully updated since 2018*
That’s not an unreasonable request unless the manager is a dumb asshole and doesn’t understand that such request can take more than one day and other tasks will suffer because of it. Done it dozens of times by request of my immediate manager.
I got pretty good at pulling sql data together from multiple tables using joins at a prior job, but no job before or since had data that easily accessible.
A girl I’ve dated for a while worked as photographer for live events reportage, clocking even thousand shots for event and saving at least a hundred of them for the job, and she told me rather often she was being later contacted by the client, or someone of his entourage, or even some other person from the public, months past the event and asking if she could send them e.g. “that picture where I’m standing with that friend of mine wearing a white shirt…”, and all that of course without even being able to tell her the actual date of the event.
Ai solves this
Not sure why you’re being downvoted, this might be the best feature of Google photos
General sentiment of groupthink against ai by dumbasses here. AI has its place though, and this type of analysis is the perfect application for ai for image content search.
We taught the infant AI to say Hitler as a joke then we got bored so now the only way it knows how to get our attention is by heiling in the school hallways while we get drunk in the comment section blaming capitalists and reddit for our absent parenting.
You can make an unstructured database? I thought the S in SQL stood for “structured”, that it was built into the language itself or something.
Doesn’t it stand for “supposed to be structured”?
STBSQL?
“Structured” refers to the query language.
SQL doesn’t actually enforce the database to be normalized at all.
It is “structured” but not well architectured/designed/structured
Database is organized collection of data, so a disk full of porn in different formats from json to mp4 can be a database, as long as it’s organized in some way
I am interested in your json porn.
{ "act1": { "position": "reverse cowgirl" "etc...": {} } }
Not sure what you expected
Edit: also found this https://json-porn.com/
Oh God, don’t watch the etc porn! You’ll never be able to unsee it…
It’s so straight. I prefer yaml porn
For real. Numbers are strings? Yeah, okay.
YAML is better. UCL porn though. 🥵 Things are getting niche when UCL shows up.
as long as it’s organized in some way
Right? Organized, structured, same thing, or? A database can’t have no structure, right? I don’t even know how one would create such a database.
What do you mean by “no structure”? Afaik mongodb does not enforce a schema in a collection by default
Ah yes, mongo and document databases, forgot about those. Yeah those could be a pain to get data from if there’s no structure. 😅
At a certain level all data is a pair (some name, blob of bytes). You can concatenate sequences of those pairs into a tar archive and call that a database. To access “the last object” you’d have to seek over the “first” objects. So you can build another set of (some name, blob of bytes) that serves as an index into the first set. You’ll first have to do at least one full pass over that first set, and you’ll need to make space on the books to account for twice as many sets, AND you’ll still have to do some seeking over the “first objects” in the indexing collection, but it all keeps recall times very short!
Heres a great idea write a frontend that u can type a human request in then pass that to chatgpt to generate an appropriate sql query then automatically send it. What could possibly go wrong.
This is sarcasm, right? Right??