Skip to content

Comments

NonEmptyCollection builders#3837

Open
kyay10 wants to merge 16 commits intomainfrom
kyay10/non-empty-mutable-collections
Open

NonEmptyCollection builders#3837
kyay10 wants to merge 16 commits intomainfrom
kyay10/non-empty-mutable-collections

Conversation

@kyay10
Copy link
Collaborator

@kyay10 kyay10 commented Dec 29, 2025

The builders work by giving the caller a MonotoneMutableList/Set and forcing the caller to return a NonEmpty & List/Set. Operations on MonotoneMutableList/Set that are sure to add items, like add and addAll(NonEmptyCollection), have a contract that gives their receiver the apt intersection type.

This PR also removes most usages of NonEmptyList/Set(), as in the unsafe constructors, and most usages of PotentiallyUnsafeNonEmptyOperation. This also meant that a lot of warning suppressions for WRONG_INVOCATION_KIND and LEAKED_IN_PLACE_LAMBDA were finally removed!

The PR doesn't compromise on inline bytecode size, either. In particular, it would be easy to call inline lambdas multiple times to get AT_LEAST_ONCE to comply. Instead, I've extensively used do-while, which resulted in cleaner code, anyway.

@kyay10 kyay10 requested a review from serras December 29, 2025 23:00
@kyay10 kyay10 changed the title kyay10/non-empty-mutable-collections NonEmptyCollection builders Dec 29, 2025
@kyay10
Copy link
Collaborator Author

kyay10 commented Dec 30, 2025

Note to self: There seems to be a strange compilation mishap with NonEmptySet and default methods. I'll work around it for now

Copy link
Member

@serras serras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of any of these changes. In particular, I don't think "monotone list builder" is a useful concept here, or even a expected one. Unless we can ensure that at least one element is present in a list, the usual:

buidList {

}.toNonEmptyListOrThrow()

seems to be good enough for most purposes.

map: (A, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10) -> R,
): NonEmptyList<R> {
contract { callsInPlace(map, InvocationKind.AT_LEAST_ONCE) }
return all.zip(t1.all, t2.all, t3.all, t4.all, t5.all, t6.all, t7.all, t8.all, t9.all, t10.all, map).wrapAsNonEmptyListOrThrow()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try not to change inline code as much as possible. One should be especially careful with such a big piece of code, for two reasons:

  • It makes more difficult to us to change anything in the API. With the current approach we could improve zip and wrapAsNonEmptyListOrThrow, and everybody would benefit; whereas this is a very low-level implementation.
  • The bytecode size of the user of the library is very inflated, and this is a concern in Android environments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case, zip is already inline, so the bytecode size concern is out the window.
I don't quite see how wrapAsNonEmptyListOrThrow can ever be "improved", but regardless, the final code doesn't do anything remotely like it, since it instead statically guarantees that the list is non-empty.
Is your concern one of code duplication then? As in, that it's a maintenance burden for Arrow to have 2 copies of zip, one for Iterable, and another for Nel?
If so, I can extract a general "Iterator zip" algorithm out of this that assumes the iterators are non-empty. Then, the Nel implementation trivially then just summons iterator()s, and the Iterable implementation summons iterator()s, and early-returns with an empty list if any iterator happens to be empty (i.e. hasNext() returns false immediately).
I didn't do this immediately because it'd end up being @PublishedApi. If duplication is a concern, though, then it's reasonable to expose it as such.
It's also completely reasonable if you'd instead prefer the original implementation. I simply wanted to reduce wrapAsNonEmptyListOrThrow calls, as well as any WRONG_INVOCATION_KIND warnings.

Comment on lines +10 to +13
/**
* A builder for collections that can only grow by adding elements.
* Removing elements is not supported to preserve monotonicity.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required at all. A buildNonEmptyList function should have just a single contract: by the end of the block you should have at least one element (regardless of using add, addAll, remove, or whatever you want). Here you're introducing a completely different notion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically, I don't see how it can be done any other way while offering a static guarantee that the list is non-empty. Sure, you can throw if the resulting list is non-empty, but that's far from a static guarantee. One can always create intermediate Lists and remove and add to them, as long as they always have a call that guarantees the non-emptiness of the builder.

I think it's interesting to see that the Arrow functions affected got by just fine with add and addAll, never needing remove.

Just to make sure I'm fully clear, this API guarantees that your list is non-empty. The compiler will complain if you don't have an add or addAll(NonEmptyCollection), or your own custom method that has the apt contract (where, of course, you'd be expected to ultimately call add or addAll). The only way to get around this guarantee is through a bad user-defined contract (to which there are compiler checkers coming eventually), or a cast to MonotoneCollectionBuilder.NonEmpty. This is similar to how you can already break NonEmptyList if you cast its all to MutableList, at least on the JVM.

If it helps, MonotoneCollectionBuilder is a suspend-less SequenceScope/FlowCollector. add/addAll correspond to yield/yieldAll and emit/emitAll.

MonotoneCollectionBuilder is designed as an effect. It's basically a write-only accumulator (maybe we should name it as such). It doesn't know, or care, how the data is stored, just that it's written somewhere. In fact, it's exactly the effect version of the Writer monad. This is incredibly useful because it doesn't force the implementation to use a List, any monoid will do. That's why the Set implementation is so easy, since that's another monoid.
What the buildNonEmptyX functions do is that they let you use any semigroup because they guarantee at least one add or non-empty addAll call. NonEmptyList and NonEmptySet are semigroups.
One can even imagine extending MonotoneCollectionBuilder (which ought to be called Writer or Accumulator) further by adding effect combinators like what we have for Raise. A mapWriter is easily implementable, and serves as a more efficient way to map a List.

Some caffienated handwaving about connections to monoids There's a deeper connection here in the form of `MonotoneCollectionBuilder.() -> Unit` actually being kind of a free monoid over `T`, and hence isomorphic to `List`. (Technically, it's actually isomorphic to `() -> List` because of exceptions and non-termination and such, and you can add `suspend` modifiers to both, too, with the isomorphism staying). Handwaving a bit, sequencing of `MonotoneCollectionBuilder.() -> Unit` functions corresponds exactly to the monoid's `combine`, with `val id: MonotoneCollectionBuilder.() -> Unit = { }` serving as the identity for the monoid. If you've ever seen how a monoid is modelled in CT, it's through a bunch of morphisms. In some way, `MonotoneCollectionBuilder` lets you map any `T` to a morphism, with the morphism composition being the sequencing. `MonotoneCollectionBuilder.() -> Unit` is the "mother of all monoids", while `List` is the free monoid, analogously to (multi-shot) `Continuation` being the "mother of all monads", while `Free` is the free monad. There's something about "initial" vs "final" encoding here that I don't know enough to comment about. What I do konw is that `MonotoneCollectionBuilder.() -> Unit` is more performant than a `List` because it doesn't involve intermediate lists, in a similar way to how continuations are more performant than `Free` because it doesn't involve intermediate values.

Copy link
Collaborator Author

@kyay10 kyay10 Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, here's a very simple version of the API to play around with (playground). Let me know if you find any way to break it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question here for me is whether providing this API is useful enough to include it in arrow-core, given that at this point we have no way to enforce that at least one call to add or addAll is done. We really made a big effort to reduce Arrow's API to its minimum, and this PR goes into the opposite direction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that at this point we have no way to enforce that at least one call to add or addAll is done

Do you mean that buildNonEmptyList doesn't enforce that? Because it does! See the playground link above. It's all compile-time checked, relying on contracts and a lambda with an intersection return type.

I can have a go at reducing the API surface here if you'd like. I was aiming to have an API similar to MutableList/Set (without the remove methods), but I think only a fraction of the API is really necessary, and we can always extend it later

@kyay10 kyay10 requested a review from serras February 8, 2026 20:46
@kyay10
Copy link
Collaborator Author

kyay10 commented Feb 8, 2026

FWIW, I can move the builders to a new module, and add an implementation dependency on it in core. This way, the API isn't exposed to outside users automatically, but we can still rely on it internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants