Yes, you can have exactly-once delivery

(blog.rongarret.info)

51 points by lisper 9 hours ago | 99 comments

In case you're someone who actually knows anything about distributed systems and you're not looking forward to slogging through this long article filled with claims like "I have a PhD in AI so I know what I'm talking about" to find where the author made their mistake, let me save you the time. It's the typical conflation of exactly-once delivery with exactly-once processing, which the author acknowledges and then chooses to ignore because they're basically the same for practical purposes, as if this somehow changes the reality of the delivery itself and the restrictions on its guarantees.

Yes, everyone knows you can layer an idempotency mechanism on top of at-least-once delivery to achieve exactly-once processing (so long as you're willing to tie up memory/storage for an infinite amount of time). But this does not equate to exactly-once delivery, and you know that.

Groxx 7 hours ago | root | parent | next |

Yea... this is kinda inflammatory but I honestly have to agree with it.

The post largely summarizes as "you can have exactly-once delivery if you re-define it to be at-least-once processing with idempotency".

Those are different things.

In fact, that's the entire point behind saying that it's impossible.

You can't design a system that is exactly-once at any level, so don't even bother trying. If someone wants you to guarantee something will happen, you can point to that impossibility to say "you need to retry, it's not optional, and anyone who tells you otherwise is lying to you". That has happened to me multiple times in my career; it's a thing charlatans keep trying to sell to businesses, and businesses eat it up because it sounds so much magically simpler than what their engineers keep telling them needs to be done.

Because it is magic. It doesn't exist.

philipswood 7 hours ago | root | parent | next |

So if you build systems using messaging middleware you get to choose upfront whether you are going to use XA with exactly-once semantics and pay the performance penalty for that or whether you will implement your business logic idempotently on processing instead.

You can do this across a whole landscape of vendors.

So this whole class of "that's impossible" responses sounds to me (as an ex-brick layer) like "obviously you can't stack bricks next to each other perfectly straightly - so building walls is clearly impossible".

So it feels a bit jarring when several vendors allow you to do this impossible thing.

Google for JMS and exactly-once and you can find documentation with several products on exactly how to do this.

One example: https://www.atomikos.com/Blog/TheSimpleSecretOfExactlyOnceDe...

Groxx 6 hours ago | root | parent |

>In this post we will show how [exactly once delivery] can be done, plus how simple it is with Atomikos. ...

>The producer should do its processing and send its message as part of a JTA/XA transaction. This ensures a message is sent if and only if the transaction commits. Any failures will result in rollback of the JTA/XA transaction - and no message will be sent. This means that failures can be safely retried until they succeed, without sending the resulting message more than once.

This is at-least-once (retries) with probably-idempotency: a transaction.

As I said: charlatans.

They're selling you "exactly once" in the headline, but clearly state that it is not exactly once in the legalese below. This is less "buyer beware" and more "blatantly false advertising", in exactly the same way as a free energy machine. The abstraction they're offering may indeed be useful, but "exactly once" is literally lying, and I wouldn't trust them to be honest elsewhere eit

Gosh, only after this comment I understood why so many programmers litter the code with retries, even though they seem superfluous.

EGreg 7 hours ago | root | parent | prev |

This is sorta like the argument about CAP impossibility theorem while in practice consensus algorithms work 99.999% of the time. Or like Shannon’s information theory showing impossibility of compression, while many compression algorithms work well on actual data.

This seems to me the same. In practical applications, you can indeed have the at-least-once delivery with an idempotency / backpressure system, and work 99% of the time, and be unavailable 1% of the time.

Groxx 7 hours ago | root | parent |

Yep, and for practical applications (i.e. stuff that exists in this universe) that is absolutely good enough. You just have to choose which tradeoffs you can stomach best. With a fancy enough system, those tradeoffs can be driven shockingly low.

But if someone tries to sell you a database that is 100% available and has perfect consistency, you can laugh and walk away. They're a flat-earther trying to sell you a bridge: they're either trying to trick you or they have no idea what they're talking about. Either way you don't want to be involved with them.

Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason. On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot. I'd go as far as to say that in practice you really can't guarantee a message will be delivered and processed because you will have finite bounds on time, the absolute best you can do is at least guarantee that it either was definitely processed once or probably was not and handle it accordingly. (I formerly wrote "definitely" for the latter, thinking you could do this with two-phase commit, and then realized after walking away from the computer that you absolutely can't guarantee that, of course. Distributed systems are such a pain to reason about.)

Do I misunderstand?

jhanschoo 5 hours ago | root | parent |

> On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot.

My understanding is that these happen IRL all the time in the guise of healing a network split or rebooting crashed nodes or bring new uninitialized servers into the system. Of course, IRL you usually translate the result to needing a different strategy to bring these systems up to speed beyond a certain threshold. But these thresholds and strategies and changing the number of nodes in the system are application-dependent, so the fiction of unbounded messages/memory/time helps focus the formal analysis and result.

In the context of, say, a distributed KV store, it cautions you that unless you have said other strategy, you will end up with an inconsistent system or failure state if your message buffers are more space-constrained than required.

[flagged]

[deleted]

Spivak 7 hours ago | root | parent | prev |

This article isn't for you then, this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible. When someone talks about at-least-once and at-most-once in the context of well-known queueing systems they say will say delivery but will mean processing, because as you say, they're the same in practice.

You typically hide the processing bit so that from the perspective of your application code it really is exactly-once.

bhaney 7 hours ago | root | parent | next |

> this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible

Those people would be better served by approximately two sentences clarifying that exactly-once processing is a different thing that can be achieved with at-least-once delivery and idempotency, rather than 20+ rambling paragraphs of redefining formal terms.

Ferret7446 7 hours ago | root | parent | prev |

I think the words "delivery" and "processing" are taught around middle school. There's probably no need to have an article for it.

tacitusarc 8 hours ago | prev | next |

I appreciate the insights here, but I am struggling to understand how “exactly one” can equate to “eliminate duplicates”. Let’s say someone arrived at my house and cut my grass, and I failed to confirm they had done so, so the company sent someone over to cut my grass again, maybe multiple times. It seems silly to claim my grass was cut exactly once, despite it consistently remaining at the same height. Obviously it was cut multiple times, just not with much effect after the first. The point of exactly-once is that the server and client don’t need to expend pointless effort on duplicates… right?

valzam 7 hours ago | root | parent | next |

In particular what exactly once delivery implies that I do not have to worry about it in my processing logic. I can build a `count += 1` and it will always be exactly correct.

The notion that there is no distinction between exactly once delivery and exactly once processing is very odd to me. In practice my processing needs to accommodate duplicates to be correct. If I had exactly once delivery my processing could be much simpler. If I could get exactly once delivery for free I would always choose it in a heartbeat.

jchw 7 hours ago | root | parent |

The point is that it doesn't matter exactly where the deduplication matters. It could happen in your own processing code, or something upstream of it, like a queue library of some kind. That's pretty much what the entire article is saying; it's hard to meaningfully distinguish what part is actually delivery versus processing. e.g. most people would consider the guarantees imparted by the TCP stack are indeed part of delivery and not processing, but your TCP stack is having to do a lot of processing work to actually maintain the logical stream of bytes.

warkdarrior 7 hours ago | root | parent |

> The point is that it doesn't matter exactly where the deduplication matters.

Actually the point is that once deduplication is done at some layer, the layers above it will have to re-achieve exactly-once delivery.

"Yes, the TCP layer did deliver this message only once, but the receiving software crashed right after, so now the sender has to send it again."

jchw 6 hours ago | root | parent |

Hmmm. Maybe this is the reason why the processing vs delivery distinction matters. Because my thought is, well of course: To fix that you only send the acknowledgement after processing succeeds.

But then again, once you do that, the processing code that is being wrapped really doesn't have to care about being idempotent anymore, as it is being handled a layer up. At that point, all it needs to care about is being atomic.

I'm not sure if it practically matters either way. I'd rather have my processing code be both atomic and idempotent regardless just to make things easier to reason about, as long as it's not too much of a burden. I've always been a fan of concepts like idempotency tokens.

Same understanding: On the receiver side, we are going to drop duplicates (by processing, or by having no effect on the grass cutting any more). Thus, the end user is then seeing only one effect, one message delivered. The effect of delivery "message received" or "grass is cut" is achieved.

But still, the sender might need to send more than once (until confirmation). From the cost at the sender "sending multiple packages" or "sending more grass cutters" this is still the scenario "send one or more".

Sorry to fuel the fire... it is about the definition of "delivery"

theamk 7 hours ago | root | parent | prev |

We are talking network stack, so there is no actions - just data hand-off to the actual application code.

Someone arrives at your house, gives you a package, says "this is order 123". You thank them, they leave, but then they are hit by a car before they can report this. You unpack the package and use it.

Next day, someone else arrives at your house, gives you a package, says "this is order 123". You thank them, they leave. You know you've already received order 123, so you throw package away without even taking it into the house.

This happens few more times, but you don't care, your trash can is big.

Done! You now have "exactly once delivery".

Now, some might argue this is "exactly once processing" and you should only count what the delivery person does.. but this depends on where you draw the boundary. I draw it at "I am taking the package into the house", and I've only ever took one package there, so it was exactly-once for me.

The key part here is cost. I am assuming that opening package and using its contents is hard and takes a long time; while answering the door and throwing the package away is easy. This is definitely the case with modern networking stack, which re-transmits stuff all the time, and where the loss rate is very low.

tacitusarc 6 hours ago | root | parent |

As this is a semantic debate over the definition of delivery, I asked my very non-technical wife if she thought in the scenario you described, the package was delivered exactly once. She said obviously not, and this discussion is very stupid, and I should stop participating in it. So there’s that.

jbergens 6 hours ago | root | parent | next |

I don't think the example was perfect which explains your wife's reaction.

Think of it more like the first delivery guy/girl left his/her car outside and wrote 123 on it. Then walked back.

The next one sees the car with a sign saying 123 and won't even ring the door bell or leave a package. Now you haven't gotten the package twice, it has not been delivered twice.

Sure you can complain that there's a car outside your home but in digital system you won't even see it. It would also cost the deliver firm a car for every package but that is not your problem and again, in the digital world the cost is a lot less than a car.

There is an argument that the street would be filled up with delivery vans ans there would be no more room for new deliveries to you or your neighbors but that is a limitation you could talk about. You probably can't handle an infinite number of packages delivered at the same time either and you won't wait an infinite amount of time for any specific package.

Try this version with your wife.

Jach 5 hours ago | root | parent | prev |

Smart wife. My take on the whole thing is that it's not wise to reason from non-technical metaphors around packages or lawn mowing when the reality is electronic systems. I don't know if it's any wiser but what I like to do is work my way up from the basics. What does delivery mean? Start with two wires, one for signal and one for common ground. (Or just one wire, and pretend you can use earth-return reliably.) If that isn't enough to resolve what terms should mean, consider them with differential signaling. If that still isn't enough to get it, consider them with relay nodes. If at some point "delivery" has changed definitions to suddenly forbid something that previously wasn't forbidden, maybe you've made a mistake.

computerfan494 7 hours ago | prev | next |

At the end of the day the author and those they are arguing with mostly agree, they simply disagree on what the word "delivery" means. Given the author's background, I wonder if the issue is that they're focused mainly on lower levels of the stack, while those who disagree mainly work with traditional applications that do things like send email, respond to webhooks, update databases, etc.

The reason I think it's important to be pedantic about distinguishing between "delivery" and "processing" is that I have seen plenty of higher level systems that have incorrectly not implemented idempotency and had bugs as a result. I have seen many folks be confused by Kafka's "Exactly-Once Semantics" feature and introduce major bugs into message processing pipelines. The author, who clearly understands these fundamental design challenges, is not my problem. It's everyone else who struggles to design safe, idempotent exactly-once systems.

jumploops 6 hours ago | prev | next |

If I receive multiple hamburgers via DoorDash, but only eat one, it’s not “exactly-once delivery.”

The extra step where I give my neighbor(s), compost, or otherwise discard the N+1 hamburgers is a processing step.

My house can only hold so many hamburgers, and I can only process so many after eating my lucky chosen one.

This is what we (distributed systems thinkers) refer to when we say “delivery” — anything after the DoorDash step is up to us, the consumer, to process.

Yes, this definition is confusing to new programmers, because it makes it hard to reason about everyday systems, but it’s this exact type of definition that we need so that we can build the proper abstractions, as the author has done in his post, to make our applications behave the way we want.

brigadier132 7 hours ago | prev | next |

It seems like this person does not understand what exactly once delivery actually means. It's not about being able to do something exactly once if the author is reading this...

lisper 6 hours ago | root | parent |

> It seems like this person does not understand what exactly once delivery actually means.

Then by all means, enlighten me.

brigadier132 6 hours ago | root | parent |

I don't want to get into a debate over semantics with you. You've redefined exactly once delivery to mean that if you have a layer of abstraction that hides that a message was delivered multiple times then it is "exactly once" delivery. That's not what exactly once delivery is. You clearly disagree but taking well understood and researched terms and redefining them to make a point is a complete waste of time for everyone. Nobody cares about what your personal definition of exactly once delivery is, they care about the definition that is used in literally all of distributed systems research.

lisper 5 hours ago | root | parent |

I'm not asking for a debate. I'm simply asking you to tell me what "exactly once delivery" actually means in a way that isn't circular and doesn't beg the question of whether or not it is possible, i.e. the definition that is used in literally all of distributed systems research. So far no one has been able to do that.

Fire-Dragon-DoL 2 hours ago | prev | next |

Isn't "exactly once processing" also incorrect, since it's all always "at most once": the system could go down and never come back online, that would result in a missed processing

supportengineer 8 hours ago | prev | next |

Over 20 years ago I worked with some specialized commercial software, Cyclone, that did guaranteed delivery of files. Guaranteed as in the legal sense. If the server sent back a ticket number to the sending client, it was a LEGAL assurance that the file was received, because there was a contract in place with financial penalties. The time stamps were particularly important in the legal contract. So, there are a lot of ways you can have exactly-once delivery, especially when talking about the application level of the 7 layer burrito model.

hinkley 7 hours ago | root | parent |

But you sometimes delivered the bytes twice, before getting the receipt, surely?

theamk 7 hours ago | root | parent |

I am sure there were some duplicate packets on the wire - it's a normal part of most network protocols. The important thing is that as far as user was concerned, it was exactly-once.

toast0 8 hours ago | prev | next |

I don't see how you can guarantee at least one delivery, but maybe I've seen too many things disappear off busses, never to be seen again.

Filligree 8 hours ago | root | parent |

You keep sending it until you get an acknowledgement back. As the article points out, this does assume nobody cut the wire.

deathanatos 7 hours ago | root | parent |

  SEND -->
  SEND -->
       <-- ACK
       <-- ACK (this node was *suuuper* slow due to page thrashing.)

Don't try to "fix" it, either; there's no way to do that.

You make the job idempotent, in some manner: make it such that receiving the message and processing it twice is safe.

(This is what TFA eventually capitulates to, but tries to call it "exactly once delivery", even in cases where deliver is occurring more than once. I don't think this view is pedagogically useful, which is partly why we say "exactly once delivery is impossible.")

lukeasrodgers 7 hours ago | prev | next |

Here is my understanding, roughly:

- say you need a messaging system to communicate between different components - that messaging system is a 3rd party library or tool, it has no knowledge of your needs or architecture - therefore it can have no knowledge of what counts as a duplicate message, it either just blasts your message off once, or blasts them off until it gets an ack, it is up to the software you build around this component to avoid duplicate processing - so yes of course you can build "exactly once processing" on top of an "at least once delivery" system - but it still makes sense to talk about the distinction between delivery and processing, and "exactly once delivery is impossible" is still (in OP's terms) a "useful" claim

I haven't personally used kafka but it and similar systems (I vaguely recall some work by Pat Helland that may fall into a similar bucket) could possibly be said to a) constitute messaging systems, b) provide exactly once delivery semantics, in that they are less of a library and more of a framework that provide a concept of "duplicate message" that you basically buy into by using those systems.

You could then argue that "if it provides exactly once delivery it is not a messaging system", maybe there's a good argument there or maybe it's just pedantry.

benreesman 7 hours ago | prev | next |

This stuff is studied to hell and back, there is a formalism.

This stuff is practiced at staggering scale and the heuristics and cheat-while-no-one-looks stuff is gamed to within an inch of its life.

There is an acknowledged nexus of the two in the public domain: https://jepsen.io/consistency.

8 hours ago | prev | next |

[deleted]

upeguiborja 6 hours ago | prev | next |

It is only possible if the channel between your "abstraction layer" (call it another system, processing, etc) and your receiver is 100% reliable.

lisper 6 hours ago | root | parent |

Yes, that's true, but that's often a reasonable assumption.

upeguiborja 6 hours ago | root | parent |

The main confusion I see is that at-least-once delivery on a 99% reliable channel can be erroneusly called exactly once delivery 99 out of 100 times. It could be reasonable depending on the scale of your communication exercise, but if you have hundreds of thousands of receivers then asumption will surely start to fail.

lisper 5 hours ago | root | parent |

The kind of reliability I'm talking about where "100% reliable" is a reasonable assumption is inside the boundaries of a single server, i.e. a CPU, memory bus, and I/O peripherals including a NIC and non-volatile storage. Yes, things can and do fail inside that boundary, but if you start taking that into account things get much more complicated. You are also no longer in the realm of distributed systems at that point, which is the context in which the controversy about exactly-once delivery arises.

stackghost 7 hours ago | prev | next |

>a simple matter of keeping track of all the delivered messages and removing duplicates

Ron, if you have received duplicate messages then by definition you have been delivered that message more than once.

I don't have a PhD in computer science so maybe you can explain how this constitutes "exactly once".

lisper 6 hours ago | root | parent |

> if you have received duplicate messages then by definition you have been delivered that message more than once

Yes. So? If you don't include a mechanism for ensuring exactly-once delivery then you will not be guaranteed exactly-once delivery. But if you do, you will. It doesn't actually require a Ph.D.

stackghost 6 hours ago | root | parent |

>> if you have received duplicate messages then by definition you have been delivered that message more than once

>Yes.

Okay, I'm glad we agree that delivering something multiple times and discarding the duplicates is not exactly-once delivery.

lisper 6 hours ago | root | parent |

As you have described it, you are right, it's not. But presumably the thing discarding the duplicates then does something with the non-duplicates, like deliver then to something else. That is exactly-once delivery.

mlyle 5 hours ago | root | parent | next |

To discard the duplicates, you need to update your state based on the first message and consult that state for all subsequent messages. Someone must be delivered those messages for this to happen.

If you say that someone can then pass you messages exactly once over a perfectly reliable channel, well.. sure. But the system still has to deal with duplicate delivery and update its state based on that. (And potentially must durably persist this state. And, then, if your application doesn't have exactly the same persistence/transactional boundaries as what's doing the deduping, you have problems...).

stackghost 6 hours ago | root | parent | prev |

You wrote in the post:

>This post was intended to be about human communication more than distributed systems or network protocols

But resorting to technical minutiae as you have done doesn't seem germane to "human communication". Honestly, seeing such mental gymnastics just to avoid losing face on an internet argument is really disappointing.

qaq 7 hours ago | prev | next |

Well pretty confused. You can't get exactly once delivery using proposed solution in general case because you can not have infinite memory for dedupe.

lisper 7 hours ago | root | parent | next |

You apparently missed this part:

"Just for the sake of completeness I should point out that removing duplicates at the receiver is a pretty extreme oversimplification of what you would do in practice to provide exactly-once delivery. A complete solution would almost certainly be an example of Greenspun's Tenth Law applied to the TCP protocol rather than Common Lisp."

two_handfuls 7 hours ago | root | parent |

Can you explain this point? I think you are trying to say the application would include an implementation of TCP? I don't see how this is related to memory.

lisper 6 hours ago | root | parent |

I was responding to this:

> You can't get exactly once delivery using proposed solution in general case because you can not have infinite memory for dedupe.

That is a specific criticism of a specific and highly oversimplified algorithm for de-duplication that I described purely for illustrative purposes, not as a suggestion for how de-duplication should actually be implemented. Actual implementation of de-duplication is much more complicated, but a solved problem that I didn't think I needed to rehash. A real implementation doesn't require infinite memory.

(Actually, even the naive algorithm doesn't require infinite memory, just unbounded memory. Those are not the same.)

qaq 4 hours ago | root | parent |

You certainly do need to rehash it or provide a link to it being solved without constraints on time window, msgs rates and msgs sizes. You also most certainly want to build it and provide it as service because no existing product claims this capability in unbounded case.

5 hours ago | root | parent | prev |

[deleted]

lisper 7 hours ago | prev | next |

Author here. Most commenters still seem to be missing the point, despite the fact that I explicitly say this in the opening sentence:

"This post is ostensibly about an obscure technical issue in distributed systems, but it's really about human communications."

and reiterate it at the end:

"This post was intended to be about human communication more than distributed systems or network protocols."

I really don't know how I could have made this any clearer.

mlyle 7 hours ago | root | parent | next |

The problem is, you're using strong language like "under any reasonable definition of 'delivery'." But everyone else is defining delivery differently than you, referring to the delivery of the message to the system itself. Your language implies everyone else is unreasonable.

When your argument depends upon everyone else being unreasonable, maybe you're the one being unreasonable.

Yes, we can make the processing that occurs in response to those delivered message(s) idempotent. But in the end, the system has to either deal with:

1. messages being delivered once or lost entirely, or

2. messages being delivered once or multiple times

You are over-explaining a way to deal with situation #2 (detect duplicates at the endpoint).

lisper 7 hours ago | root | parent |

> referring to the delivery of the message to the system itself

And how do you define "the system itself"?

mlyle 7 hours ago | root | parent |

The thing that is at the end of the lossy medium. It must tolerate (0 or 1) or (1 or more) things being delivered to it.

lisper 7 hours ago | root | parent |

Yes, that is true. But why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?

It feels to me like an argument over whether or not humans can fly. An unassisted human cannot fly, but with some technological augmentation, they can. It seems a bit pedantic to deny that someone can fly from LA to New York simply because they have to get into an airplane to do it.

srkiNZ 6 hours ago | root | parent | next |

> why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?

because the "de-duplicator" would either:

* be somewhere else on an unreliable network (in which case we have the same problem)

* be on the same machine (or in the same process) as "the system itself" (in which case from a distributed systems perspective makes it the same thing)

> It seems a bit pedantic...

It is pedantic. The only reason that these "delivery" rules are popular is because of how many times programmers have gotten it wrong. Mostly by making assumptions that either:

* the network is reliable

* the message queue (or whatever) will de-duplicate messages for me

mlyle 6 hours ago | root | parent | prev |

Having a clear system boundary is required for analysis.

Knowing that messages will be delivered 1+ times gives us a variety of ways we could choose to deal with this on the endpoint, with different vulnerabilities. (Getting "exactly once" processing usually requires making various kinds of resilience tradeoffs based on timing windows, storage requirements, etc).

> It seems a bit pedantic to deny that someone can fly from LA to New York

At this point I question your good faith. You're calling people out by name, and you're going full on "well, aktuallyyyy" and seeming to deliberately misunderstand other peoples' assertions. "People can't breathe underwater" v. "Well, once I was in a tunnel that was under a body of water, and I still breathed!(@!("

If you choose to define words differently than everyone else, you're just sabotaging your own communication to try and feel smart.

lisper 5 hours ago | root | parent |

> You're calling people out by name

I am? Where?

> Getting "exactly once" processing usually requires making various kinds of resilience tradeoffs based on timing windows, storage requirements, etc

Yes, of course. But that's not the same as "impossible".

mlyle 5 hours ago | root | parent |

> I was reading Hacker News a few days ago and stumbled on a comment posted by ...

lisper 4 hours ago | root | parent |

Really? That is what causes you to question whether or not I'm acting in good faith?

If that's what you call "calling people out by name" I guess we'll just have to agree to disagree.

mlyle 4 hours ago | root | parent |

The whole refusal to accept that a field could legitimately define something differently than how you prefer, and then running off to blog about it and name names... and then coming around for round II of flamewar... with ever more splitting of hairs in definitions... is not awesome.

You are especially well-answered here, I think: https://news.ycombinator.com/item?id=41599131

One reason the delivery / processing distinction exists because very often the application needs to atomically persist "I have received this message" with any other state changes made as a result of processing that message for correctness. You can't generally solve this with a layer put on top, even on the same machine. If it's not atomic, then you can still deliver duplicates to the application or end up never delivering to the application. (Power goes out when one side has written but not the other).

So, the state change to "already received" and the changes you want to make in response to the message being received have to happen together. TCP or even a message queueing implementation with a persistence layer cannot solve this problem for you. Thus, the application needs to deal with multiple delivery.

Imagine a "subtract $5 from my bank account" message with no ID on the message itself, and a layer "on top" that gives IDs and tries to ensure exactly once delivery. If the layer "on top" does not change state at the exact time $5 is deducted from the account, bad things can happen-- and in practice this is impossible. Hence, the application needs to be able to cope with the "subtract $5" being delivered to it multiple times, and this deduping has to be intimately tied to it subtracting the $5 (processing).

This has been a fun thread to read, though not reflecting highly on HN. For what it's worth, I agree with you that the original adage kicking this off is kind of silly, and basically wrong. For further enjoyment I found this interesting blog post from another PhD that addresses things more comprehensively (and also basically agrees with you): https://www.mydistributed.systems/2021/10/exactly-once-deliv...

The opening lines include: "The exact definition, however, is not agreed upon in the community. As a result, there is a debate on whether EOD is possible or impossible to achieve." If nothing else, I and probably others learned today that this is apparently a debate that can quickly turn into a flamewar. And I thought flamewars were mostly dead!

Another interesting paper that came up, as I have an interest in TLA+ proofs: "LogPlayer: Fault-tolerant Exactly-once Delivery using gRPC Asynchronous Streaming" https://arxiv.org/abs/1911.11286 It seems there's no problem in the community to do things like prove fault-tolerant exactly-once delivery, even if such terminology isn't universally agreed on.

Don’t use technical terms like “exactly-once delivery” if you don’t have want it to be interrupted as technical

lisper 7 hours ago | root | parent |

It is hard to make the point that "exactly-once delivery" is not a technical term without referring to it. If you think it is a technical term, would you kindly point me to a definition? I'm particularly interested in learning how "exactly-once delivery" is distinguished from "exactly-once processing".

mlyle 7 hours ago | root | parent | next |

A 4 year old piece laying out the exact difference as it's understood and how people use the terms:

https://blog.bulloak.io/post/20200917-the-impossibility-of-e...

I read similar content at least 2 decades ago...

> While exactly-once-delivery is not possible, we have a way out: Exactly-once processing. Exactly-once processing is the guarantee that even though we may receive a message multiple times, in the end, we observe the effects of a single processing operation. This can be achieved in two ways:

> Deduplication: dropping messages if they are received more than once

> Idempotent processing: applying messages more than once has precisely the same effect as applying it exactly once

(I view deduplication as a special case of idempotency).

lisper 6 hours ago | root | parent |

> A 4 year old piece laying out the exact difference...

"Exactly-once delivery guarantee is the guarantee that a message can be delivered to a recipient once, and only once."

That seems circular to me.

Also, the author's proof is flawed. The 2GP requires more than exactly-once delivery, it requires common knowledge. It is not enough for the first general to know that the message will be received, it is required that the first general knows that the message has been received, and that the second general knows that the first general knows this, and that the first general knows that the second general knows... and so on.

mlyle 5 hours ago | root | parent |

The piece makes a very simple distinction.

Delivery is the property of a message showing up at a receiver, irrespective of the receiver making state changes.

Processing is making state changes.

You can't dedupe messages without some kind of state change. Your guards, writing down that a given message has been here before, have been delivered the message. An endpoint on a lossy medium has to cope with either (0 or 1) or (1 or many) messages.

Now, can the guards deliver it to you at most once? Well, if there's no lossy medium between, sure. But we already know that we can deliver exactly once when the medium is perfectly reliable. The guards have "processed" the message for this purpose, and the fact that they can deliver it to you over a perfectly reliable channel is moot.

The distinction is between the characteristics of the channel (delivery) versus what the receiver must do to achieve appropriate processing properties.

These might come from some combination of intrinsic idempotency, timers, persisting past messages to disks, establishing a message ordering, etc, etc, etc. These are the mechanisms that you need to cope with "one or many" delivery, and they all shape the state model of your system with respect to messages.

dragonwriter 7 hours ago | root | parent | prev |

Draw a diagram containing a source system, a destination system, and an unreliable communication channel between them. The destination system also has an output with no unreliable communications channel.

Exactly-once delivery means that a message sent from the source system reaches the destination system exactly once, and its result reaches the output channel exactly once as a consequence.

Excactly-once processing means that a message sent from the source system produces the expected output from the destination system once, even though it may be received by the destination system more than once.

(That's a little sloppy because it could use more discussion of the conditions in which it won't be received zero times, and how those are different between exactly-once and at-most-once delivery, but that's mostly beside the point because it isn't part of the distinction between exactly-once delivery and exactly-once processing. And, while definitely technical, they always involved a somewhat idealized view of the destination system, because all communications channels, including those internal to a single device, have some degree of unreliability.)

lisper 6 hours ago | root | parent |

> That's a little sloppy

Yes, that is exactly my point. The only way you can make it non-sloppy is to define "delivery" as being something that happens exclusively upstream of deduping.

dragonwriter 6 hours ago | root | parent |

No, I'm saying its "sloppy" as a definition because while it addresses the distinction you ask about it, it doesn't fully cover what distinguishes exactly-once from at-most-once.

> The only way you can make it non-sloppy is to define "delivery" as being something that happens exclusively upstream of deduping.

"Deduping" can happen in many places. If it happens anywhere before the destination system end of the unreliable connection it is part of delivery (but also can't get you to exactly-once delivery). If it happens on the destination side of the unreliable communication channel, then yes, it's not part of the delivery guarantee, it is how you get exactly-once processing from at-least-once delivery. This has been well-known for a very long time. (I don't think it was new when I first encountered it in 1999.)

Right, but the meaning of “delivery” in human communication is still different from how you are using it…

lisper 6 hours ago | root | parent |

I disagree. If I'm an actual general in a fort with a gate, and I tell the gate guards to inspect messages brought in by couriers and not allow couriers carrying duplicate messages to enter, the ones that the guards let through are still delivering those messages to me.

I had a similar thought as what's in your post, but didn't share because I thought the reaction would be something like this. I personally would word this as you can achieve exactly once semantics by combining at least once delivery with idempotency.

Your proposed definition of "delivery" is absurd.

If you have duplicate things, then you've clearly been delivered more than one thing. There is no way to deliver something exactly once, and yet the receiver has more than one thing such that they can throw all but one thing away.

It's okay to admit you were mistaken.

lisper 7 hours ago | root | parent |

> If you have duplicate things, then you've clearly been delivered more than one thing.

Yes, that's true. But this doesn't turn on what "delivery" means, it turns on what "you" means. If "you" are downstream of a de-duplication mechanism, then "you" can get exactly once-delivery. Why is that so absurd?

stackghost 6 hours ago | root | parent | next |

>Yes, that's true. But this doesn't turn on what "delivery" means, it turns on what "you" means. If "you" are downstream of a de-duplication mechanism, then "you" can get exactly once-delivery. Why is that so absurd?

So in the case of, say, a network service on server A and a network client B, your solution to "exactly once delivery" is to re-define it as "deliver it from A to B multiple times and have B deduplicate"?

Do you not see how nonsensical that is to call that "exactly once delivery"?

If you have a reliable connection between “you” and the deduplicator, then “you” aren’t receiving messages over an unreliable connection at all and so the claim that you can’t have exactly once delivery over an unreliable connection isn’t applicable in the first place. You’re receiving messages over a reliable connection and what happens upstream of that is irrelevant.

If "you" are "downstream" of a "de-duplication mechanism" how do you ensure "exactly once delivery" between "you" and the "de-duplication mechanism"?

lisper 6 hours ago | root | parent |

The same way I ensure any behavior in a digital system. There are boundaries inside of which processes are presumed to be reliable, typically consisting of a CPU, memory busses, and attached non-volatile storage. If you don't assume that those are reliable, then you can't guarantee anything.

srkiNZ 6 hours ago | root | parent |

Great! I agree 100%. We have to assume a "reliable network" within a "boundary" (i.e. a computers CPU, memory, busses etc...). Distributed systems (from which these rules are taken) are specifically systems where anything within one of these "boundaries" is considered a "single node" and treated the same, whether it's a NIC, a kernel module/driver, a user space process or anything else.

In our case if we were to take (for example) that the NIC would de-duplicate the messages for us, anyone writing the producer/sender and a user-space program for the receiver a would need to know that the NIC was doing this (or risk having messages dropped for failure of including a unique id).

This is a pedantic point, but I would strongly stress that the only reason these "delivery rules" are so popular (and evoke such a reaction) is because of the very large number of times that programmers mis-understand them.

Commonly they either assume that:

* the network is reliable

* something else will guarantee "exactly once delivery" for me (when in fact nothing will)

computerfan494 7 hours ago | root | parent | prev |

You're correct, but in my experience the vast majority of code written is not downstream of a usable de-duplication mechanism.

lisper 6 hours ago | root | parent |

That may well be, but that's a very different question.

computerfan494 6 hours ago | root | parent |

It explains why most people seem to disagree with you on what "delivery" and "you" mean in this context. For the majority of contexts, "delivery" means that a system responsible for de-duplication receives the message.

lisper 6 hours ago | root | parent |

Yeah, but that just seems like a bizarre definition on which to base the claim that you cannot have exactly-once delivery. Obviously, if you define delivery to preclude de-duplication, then you can't have exactly-once delivery. But you can have something that delivers messages (for some reasonable definition of "delivers") exactly once. It seems weird to define delivery in such a way that such a system does not provide exactly-once delivery.

bigstrat2003 6 hours ago | root | parent | prev |

I'm sorry people are being such utter jerks about this. Honestly, the comments on this are an embarrassment. Even if you are wrong, there's absolutely no call for the tone a lot of the comments are taking.

lisper 6 hours ago | root | parent |

Thanks.

jongjong 8 hours ago | prev |

Yes of course you can have exactly once delivery if you clearly define who/what the receiver is. You can easily deduplicate messages on the receiving end based on UUIDs created by the sender for example... Since they are duplicates of the exact same message, it doesn't matter which one is 'the original'.

It makes sense to worry about this only if you're worried about wasting bandwidth in the event of network instability (since the same message may sometimes traverse the network multiple times) but that's not generally something engineers should worry about.

It's ironic how some people use this to try to talk down to 'junior' developers.

Anyone can memorise hearsay about distributed systems but few can speak from experience.

qaq 7 hours ago | root | parent | next |

you can't in general case because you don't have infinite memory for dedupe.

lisper 7 hours ago | root | parent | next |

You apparently missed this:

jongjong 7 hours ago | root | parent | prev |

Good point but you don't need infinite memory. You can set expiries to discard, for example. Expiries and timeouts can be defined as part of the protocol. You can impose reasonable constraints per-socket on the buffer size for spam prevention. There is a lot of wiggle room.

On the sender side, you can require an ACK response for each message UUID and rebroadcast only if the ACK is not received within a certain timeframe.

You don't necessarily need a sophisticated algorithm to get a good practical solution which solves real problems.

mtndew4brkfst 7 hours ago | root | parent | next |

If lack of a successful ACK means you retry indefinitely until you do get an ACK, then whichever side handles deduplication need to store all observed messages indefinitely or you may deliver the same message more than once without correctly deduplicating it.

If instead you only store them for N days but an ongoing retry loop finally succeeds in N+1 days, you get duplicated delivery.

If a retry loop gives up after finite elapsed time, which sets a ceiling on how long you must retain observations, then you do not in fact achieve at-least-once delivery without using a lesser qualifying statement.

One of those constraints must be compromised on. That doesn't mean a practically useful implementation can't exist, it just means a theoretically ideal one can't without the infinite memory.

Many systems and real world processes would actually suffer negative consequences if a weeks-old payload was delivered-as-new, so hardly anyone balks at this part of the implications. But again, you have to redefine at least one of the claims (or acquire infinite memory) to have a coherent system that fulfills those claims.

two_handfuls 7 hours ago | root | parent | prev |

Right, but all this assumes timing guarantees that are not present in the original impossibility result.

nhumrich 6 hours ago | root | parent | prev |

You seem to be missing the point though. Wanting "exactly once delivery" in practice is like saying, "I shouldn't need to worry about dedupes at the application level". Which is everyone's dream, but the seniors are saying, "yes, you have to handle it at the application level, there is no way around that".