one thing i'm not a huge fan of is dgraph's UID model, which is effectively an auto-incrementing uint across the entire cluster. because it auto increments server side, it's non-deterministic; ingesting 10 nodes before 10 others means that the UIDs will change across despite being the same XID.
there is a way to use "blank nodes" to link nodes and edges with non-int UIDs, but that is only per-mutation, not per-commit or per-transaction. there is no way to tell dgraph what the UID should be.
that means that if you have externally unique IDs that you have infrastructure around, you are either caching that node's UID externally or doing an XID->UID lookup in order to create edges.
there is a bulk loader but that's only available in HA mode, and the UID:XID map it generates is obviously for data you already had in flat files (or whatever). so it's ok for static data sets, but not ideal for live updating data.
the gRPC API also has strange undocumented (AFAICT) behavior where even smallish batches of 100 hit some unspecified gRPC limit, so you need smaller batches ergo more commits ergo more wasted compute.
> there is no way to tell dgraph what the UID should be.
There is. You can lease UIDs from Zero, and do your own assignment. Look at /assign endpoint [1]
> doing an XID->UID lookup in order to create edges.
Also, you can use upserts to do an XID lookup, before creating a new node. Which is practically what other DBs do too.
> there is a bulk loader but that's only available in HA mode
Don't know what that means. Bulk Loader is a single process (not distributed), and can be used to bootstrap a Dgraph cluster. The cluster can be a 2-node cluster, or an HA cluster, that doesn't matter.
> where even smallish batches of 100 hit some unspecified gRPC limit
Never heard of that. Grpc does have a 4GB per message limit. But, I doubt you'd hit that with 100 records.
i hope this comes across as non-critical feedback, but it'd be really, really nice to put that assign endpoint in some form or fashion in the Mutation documentation. it is completely absent from there, and i don't recall seeing it in the tour of dgraph either.
furthermore, it's absent from the golang client. the documentation states:
> Itβs possible to interface with Dgraph directly via gRPC or HTTP. However, if a client library exists for you language, this will be an easier option.
however it looks like i'll need an additional HTTP layer to interface with the /assign endpoint. not a huge deal, but that seems like a big functionality gap with the golang endpoint - would definitely like to see that added in there.
lastly, the /assign endpoint and the bulk loader can only be run with a DGraph Zero instance, which, as far as i can tell, which doesn't run by default with the provided docker image. that's an important detail that's not super duper obvious from the docs, until you start seeing parameters like dgraph-zero, and then realizing that it doesn't come with the quick start docker image.
again, hope this isn't taken personally. thanks for your work on the project!
No worries at all, I like to hear feedback from users, whether its positive or negative. Though, I also like to separate wheat from chaff, which is why I have suggestions / follow up questions, etc.
Assign endpoint is something that you can just do once. You could say, give me a million UIDs, and then use them however you want. You don't need to call it repeatedly.
Also, its an endpoint to Zero, not to Alpha. Zeros are not supposed to be directly talked to, in a running cluster. We're now doing work around exposing some of Zero endpoints via Alphas, in our GraphQL rewrite of the /admin endpoint. So, that might make it easier.
I think the consistent theme I'm hearing here is that our documentation isn't clear -- we aim to improve that. But, could use more critical, logical feedback / suggestions on our forum -- so please feel free to pitch in there.
that means that if you have externally unique IDs that you have infrastructure around, you are either caching that node's UID externally or doing an XID->UID lookup in order to create edges.
there is a bulk loader but that's only available in HA mode, and the UID:XID map it generates is obviously for data you already had in flat files (or whatever). so it's ok for static data sets, but not ideal for live updating data.
the gRPC API also has strange undocumented (AFAICT) behavior where even smallish batches of 100 hit some unspecified gRPC limit, so you need smaller batches ergo more commits ergo more wasted compute.