[core] Adam Roach's Discuss on draft-ietf-core-coap-tcp-tls-08: (with DISCUSS and COMMENT)

Discussion:

Adam Roach

2017-05-09 04:51:24 UTC

Adam Roach has entered the following ballot position for
draft-ietf-core-coap-tcp-tls-08: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)

Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-core-coap-tcp-tls/

----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

- Part of the document is outside the scope of the charter of the WG
which requested its publication

While I understand that this document requires a WebSockets mechanism for
.well-known, and that such a mechanism doesn’t yet exist, it seems pretty
far out of scope for the CORE working group to take on defining this
itself (unless I missed something in its charter, which is entirely
possible: it’s quite long). Specifically, I fear that this venue is
unlikely to bring such a change to the attention of those people best
positioned to comment on whether .well-known is appropriate for
WebSockets.

Even if this is in scope for CORE, it really needs to be its own
document. If some future document comes along at a later point and wants
to make use of its own .well-known path with WebSockets, it would be
really quite strange to require it to reference this document in
describing .well-known for WS.

----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

General — this is a very bespoke approach to what could have been mostly
solved with a single four-byte “length” header; it is complicated on the
wire, and in implementation; and the format variations among CoAP over
UDP, coap+tls, and coap+ws are going to make gateways much harder to
implement and less efficient (as they will necessarily have to
disassemble messages and rebuild them to change between formats). The
protocol itself mentions gateways in several places, but does not discuss
how they are expected to map among the various flavors of CoAP defined in
this document. Some of the changes seem unnecessary, but it could be that
I’m missing the motivation for them. Ideally, the introduction would work
harder at explaining why CoAP over these transports is as different from
CoAP over UDP as it is, focusing in particular on why the complexity of
having three syntactically incompatible headers is justified by the
benefits provided by such variations.

Additionally, it’s not clear from the introduction what the motivation
for using the mechanisms in this document is as compared to the
techniques described in section 10 (and its subsections) of RFC 7252.
With the exception of subscribing to resource state (which could be
added), it seems that such an approach is significantly easier to
implement and more clearly defined than what is in this document; and it
appears to provide the combined benefits of all four transports discussed
in this document. My concern here is that an explosion of transport
options makes it less likely that a client and server can find two in
common: the limit of the probability of two implementations having a
transport in common as the number of transports approaches infinity is
zero. Due to this likely decrease in interoperability, I’d expect to see
some pretty powerful motivation in here for defining a third, fourth,
fifth, and sixth way to carry CoAP when only TCP is available (I count
RFC 7252 http and https as the first and second ways in this
accounting).

I’m also a bit puzzled that CoAP already has an inherent mechanism for
blocking messages off into chunks, which this document circumvents for
TCP connections (by allowing Max-Message-Size to be increased), and then
is forced to offer remedies for the resultant head-of-line blocking
issues. If you didn’t introduce this feature, messages with a two-byte
token add six bytes of overhead for every 1024 bytes of content — less
than 0.6% size inflation. It seems like a lot of complicated machinery —
which has a built-in foot-gun that you have to warn people about misusing
— for a very tiny gain. I know it’s relatively late in the process, but
if these trade-offs haven't had a lot of discussion yet, it’s probably
worth at least giving them some additional thought.

I’ll note that the entire BERT mechanism seems to fall into the same trap
of adding extra complexity for virtually nonexistent savings. CoAP
headers are, by design, tiny. It seems like a serious over-optimization
to try to eliminate them in this fashion. In particular, you’re making
the actual implementation code larger to save a trivial number of bits on
the wire; I was under the impression that many of the implementation
environments CoAP is intended for had some serious on-chip restrictions
that would point away from this kind of additional complexity.

Specific comments follow.

Section 3.3, paragraph 3 says that an initiator may send messages prior
to receiving the remote side’s CSM, even though the message may be larger
than would be allowed by that CSM. What should the recipient of an
oversized message do in this case? In fact, I don’t see in here what a
recipient of a message larger than it allowed for in its CSM is supposed
to do in response at *any* stage of the connection. Is it an error? If
so, how do you indicate it? Or is the Max-Message-Size option just a
suggestion for the other side? This definitely needs clarification.
(Aside — it seems odd and somewhat backwards that TCP connections are
provided an affordance for fine-grained control over message sizes, while
UDP communications are not.)

Section 4.4 has a prohibition against using WebSockets keepalives in
favor of using CoAP ping/pong. Section 3.4 has no similar prohibition
against TCP keepalives, while the rationale would seem to be identical.
Is this asymmetry intentional? (I’ll also note that the presence of
keepalive mechanisms in both TCP and WebSockets would seem to make the
addition of new CoAP primitives for the same purpose unnecessary, but I
suspect this has already been debated).

Section 5 and its subsections define a new set of message types,
presumably for use only on connection-oriented protocols, although this
is only implied, and never stated. For example, some implementors may see
CSM, Ping, and Pong as potentially useful in UDP; and, finding no
prohibition in this document against using them, decide to give it a go.
Is that intended? If not, I strongly suggest an explicit prohibition
against using these in UDP contexts.

Section 5.3.2 says that implementations supporting block-wise transfers
SHOULD indicate the Block-wise Transfer Option. I can't figure out why
this is anything other than a "MUST". It seems odd that this document
would define a way to communicate this, and then choose to leave the
communicated options as “YES” and “YOUR GUESS IS AS GOOD AS MINE” rather
than the simpler and more useful “YES” and “NO”.

I find the described operation of the Custody Option in the operation of
Ping and Pong to be somewhat problematic: it allows the Pong sender to
unilaterally decide to set the Custody Option, and consequently
quarantine the Pong for an arbitrary amount of time while it processes
other operations. This seems impossible to distinguish from a
failure-due-to-timeout from the perspective of the Ping sender. Why not
limit this behavior only to Ping messages that include the Custody
Option?

I find the unmotivated definition of the default port for “coaps+tcp” to
443 — a port that is already assigned to https — to be surprising, to put
it mildly. This definitely needs motivating text, and I suspect it's
actually wrong.

I am similarly perplexed by the hard-coded “must do ALPN *unless* the
designated port takes the magical value 5684” behavior. I don’t think
I’ve ever seen a protocol that has such variation based on a hard-coded
port number, and it seems unlikely to be deployed correctly (I’m imaging
the frustration of: “I changed both the server and the client
configuration from the default port of 5684 to 49152, and it just stopped
working. Like, literally the *only* way it works is on port 5684. I've
checked firewall settings everywhere and don't see any special handling
for that port -- I just can't figure this out, and it's driving me
crazy.”). Given the nearly universal availability of ALPN in pretty much
all modern TLS libraries, it seems much cleaner to just require ALPN
support and call it done. Or *don’t* require ALPN at all and call it
done. But *changing* protocol behavior based on magic port numbers seems
like it’s going to cause a lot of operational heartburn.

The final paragraph of section 8.1 is very confusing, making it somewhat
unclear which of the three modes must be implemented on a CoAP client,
and which must be implemented on a CoAP server. Read naïvely, this sounds
like clients are required to do only one (but one of their choosing) of
these three, while servers are required to also do only one (again, of
their choosing). It seems that the chance of finding devices that could
interoperate under such circumstances is going to be relatively low: to
work together, you would have to find a client and a server that happened
to make the same implementation choice among these three. What I’m used
to in these kinds of cases is: (a) server must implement all, client can
choose to implement only one (or more), (b) client must implement all,
server can choose to implement only one (or more), or (c) client and
server must implement a specifically named lowest-common denominator, and
can negotiate up from there. Pretty much anything else (aside from
strange “everyone must implement two of three” schemes) will end up with
interop issues.

Although the document clearly expects the use of gateways and proxies
between these connection-oriented usages of CoAP and UDP-based CoAP,
Appendix A seems to omit discussion or consideration of how this
gatewaying can be performed. The following list of problems is
illustrative of this larger issue, but likely not exhaustive. (I'll note
that all of these issues evaporate if you move to a simpler scheme that
merely frames otherwise unmodified UDP CoAP messages)

Section A.1 does not indicate what gateways are supposed to do with
out-of-order notifications. The TCP side requires these to be delivered
in-order; so, do this mean that gateways observing a gap in sequence
numbers need to quarantine the newly received message so that it can
deliver the missing one first? Or does it deliver the newly-received
message and then discard the “stale” one when it arrives? I don’t think
that leaving this up to implementations is particularly advisable.

Section A.3 is a bit more worrisome. I understand the desired
optimization here, but where you reduce traffic in one direction, you run
the risk of exploding it in the other. For example, consider a coap+tcp
client connecting to a gateway that communicates with a CoAP-over-UDP
server. When that client wants to check the health of its observations,
it can send a Ping and receive a Pong that confirms that they are all
alive and well. In order to be able to send a Pong that *means* “all your
observations are alive and well,” the gateway has to verify that all the
observations are alive and well. A simple implementation of a gateway
will likely check on each observed resource individually when it gets a
Ping, and then send a Pong after it hears back about all of them. So, as
a client, I can set up, let’s say, two dozen observations through this
gateway. Then, with each Ping I send, the gateway sends two dozen checks
towards the server. This kind of message amplification attack is an
awesome way to DoS both the gateway and the server. I believe the
document needs a treatment of how UDP/TCP gateways handle notification
health checks, along with techniques for mitigating this specific
attack.

Section A.4 talks about the rather different ways of dealing with
unsubscribing from a resource. Presumably, gateways that get a reset to a
notification are expected to synthesize a new GET to deregister on behalf
of the client? Or is it okay if they just pass along the reset, and
expect the server to know that it means the same thing as a
deregistration? Without explicit guidance here, I expect server and
gateway implementors to make different choices and end up with a lack of
interop.
** There is 1 instance of too long lines in the document, the longest one
being 3 characters in excess of 72.

Hannes Tschofenig

2017-05-09 08:09:51 UTC

Permalink

Hi Adam,

thanks for your review.

Post by Adam Roach
----------------------------------------------------------------------
----------------------------------------------------------------------
- Part of the document is outside the scope of the charter of the WG
which requested its publication
While I understand that this document requires a WebSockets mechanism for
.well-known, and that such a mechanism doesnât yet exist, it seems pretty
far out of scope for the CORE working group to take on defining this
itself (unless I missed something in its charter, which is entirely
possible: itâs quite long). Specifically, I fear that this venue is
unlikely to bring such a change to the attention of those people best
positioned to comment on whether .well-known is appropriate for
WebSockets.
Even if this is in scope for CORE, it really needs to be its own
document. If some future document comes along at a later point and wants
to make use of its own .well-known path with WebSockets, it would be
really quite strange to require it to reference this document in
describing .well-known for WS.

The authors of the document have different views about the inclusion of
the support of WebSockets in the document. I leave it to the responsible
AD to decide what the best document structure is and what is indeed
covered as part of the CORE working group charter.

Post by Adam Roach
----------------------------------------------------------------------
----------------------------------------------------------------------

You have a couple of comments, namely

* Variable length format

The group decided to have a variable length format. I argued for a fixed
size length format and lost the argument.

* Gateways and their complexity

We are using gateway functionality today in our deployments but they are
not just simple protocol translations, as described in RFC 7252 or in
RFC 8075. Instead they two protocols on each side of the gateway have
different semantic and functionality. As such, the considerations in
those two RFCs don't apply to us and we are not seeing any of that
complexity.

* Too many transport options

We care only about CoAP over TLS. We are not going to use the WebSockets
part of the document. In practice for many companies there will not be a
problem with too many transports since they will only use specific ones
in their deployment.

* Block-wise transport with CoAP over TCP

Maybe this needs to be better explained but CoAP is tailored to small
data transmissions only. Unfortunately, there are some larger payloads
to be shuffled around as well, particularly firmware updates.

When RFC 7959 is used with TCP we found out that the performance is
quite bad since the block-wise transfer spec limits the size of the
chunks to a really small size (2048 bytes). The addition in this spec is
to increase the size of the chunks.

I will see whether the text can be improved to get his message across.

Post by Adam Roach
General â this is a very bespoke approach to what could have been mostly
solved with a single four-byte âlengthâ header; it is complicated on the
wire, and in implementation; and the format variations among CoAP over
UDP, coap+tls, and coap+ws are going to make gateways much harder to
implement and less efficient (as they will necessarily have to
disassemble messages and rebuild them to change between formats). The
protocol itself mentions gateways in several places, but does not discuss
how they are expected to map among the various flavors of CoAP defined in
this document. Some of the changes seem unnecessary, but it could be that
Iâm missing the motivation for them. Ideally, the introduction would work
harder at explaining why CoAP over these transports is as different from
CoAP over UDP as it is, focusing in particular on why the complexity of
having three syntactically incompatible headers is justified by the
benefits provided by such variations.
Additionally, itâs not clear from the introduction what the motivation
for using the mechanisms in this document is as compared to the
techniques described in section 10 (and its subsections) of RFC 7252.
With the exception of subscribing to resource state (which could be
added), it seems that such an approach is significantly easier to
implement and more clearly defined than what is in this document; and it
appears to provide the combined benefits of all four transports discussed
in this document. My concern here is that an explosion of transport
options makes it less likely that a client and server can find two in
common: the limit of the probability of two implementations having a
transport in common as the number of transports approaches infinity is
zero. Due to this likely decrease in interoperability, Iâd expect to see
some pretty powerful motivation in here for defining a third, fourth,
fifth, and sixth way to carry CoAP when only TCP is available (I count
RFC 7252 http and https as the first and second ways in this
accounting).
Iâm also a bit puzzled that CoAP already has an inherent mechanism for
blocking messages off into chunks, which this document circumvents for
TCP connections (by allowing Max-Message-Size to be increased), and then
is forced to offer remedies for the resultant head-of-line blocking
issues. If you didnât introduce this feature, messages with a two-byte
token add six bytes of overhead for every 1024 bytes of content â less
than 0.6% size inflation. It seems like a lot of complicated machinery â
which has a built-in foot-gun that you have to warn people about misusing
â for a very tiny gain. I know itâs relatively late in the process, but
if these trade-offs haven't had a lot of discussion yet, itâs probably
worth at least giving them some additional thought.
Iâll note that the entire BERT mechanism seems to fall into the same trap
of adding extra complexity for virtually nonexistent savings. CoAP
headers are, by design, tiny. It seems like a serious over-optimization
to try to eliminate them in this fashion. In particular, youâre making
the actual implementation code larger to save a trivial number of bits on
the wire; I was under the impression that many of the implementation
environments CoAP is intended for had some serious on-chip restrictions
that would point away from this kind of additional complexity.
Specific comments follow.
Section 3.3, paragraph 3 says that an initiator may send messages prior
to receiving the remote sideâs CSM, even though the message may be larger
than would be allowed by that CSM. What should the recipient of an
oversized message do in this case? In fact, I donât see in here what a
recipient of a message larger than it allowed for in its CSM is supposed
to do in response at *any* stage of the connection. Is it an error? If
so, how do you indicate it? Or is the Max-Message-Size option just a
suggestion for the other side? This definitely needs clarification.
(Aside â it seems odd and somewhat backwards that TCP connections are
provided an affordance for fine-grained control over message sizes, while
UDP communications are not.)

I personally would set a minimum requirement for the size of message the
remote site needs to support. Thereby, the initiator can be sure that
messages up to a certain size are supported. If it wants to send larger
messages then it has to wait till the remote site provides their CSM.

In our environment this would not be a problem with the TCP server is
actually not on the IoT device but rather on the cloud-based (or
on-premise-based) server instead. The TCP client is running on the IoT
device.

Post by Adam Roach
Section 4.4 has a prohibition against using WebSockets keepalives in
favor of using CoAP ping/pong. Section 3.4 has no similar prohibition
against TCP keepalives, while the rationale would seem to be identical.
Is this asymmetry intentional? (Iâll also note that the presence of
keepalive mechanisms in both TCP and WebSockets would seem to make the
addition of new CoAP primitives for the same purpose unnecessary, but I
suspect this has already been debated).

The issue was that TCP keepalives are sometimes getting blocked or
modified by firewalls whereas the CoAP ping/pong on top of TLS won't

Post by Adam Roach
Section 5 and its subsections define a new set of message types,
presumably for use only on connection-oriented protocols, although this
is only implied, and never stated. For example, some implementors may see
CSM, Ping, and Pong as potentially useful in UDP; and, finding no
prohibition in this document against using them, decide to give it a go.
Is that intended? If not, I strongly suggest an explicit prohibition
against using these in UDP contexts.

I believe a similar issue came up recently (provided by Jim) when he was
asking whether this mechanism is also applicable to a transport over SMS.

If there is functionality in the document that is useful for other
transports in the future then that's great. I wouldn't rule out such use
just because we cannot imagine it today.

Post by Adam Roach
Section 5.3.2 says that implementations supporting block-wise transfers
SHOULD indicate the Block-wise Transfer Option. I can't figure out why
this is anything other than a "MUST". It seems odd that this document
would define a way to communicate this, and then choose to leave the
communicated options as âYESâ and âYOUR GUESS IS AS GOOD AS MINEâ rather
than the simpler and more useful âYESâ and âNOâ.

Sounds reasonable to me.

Post by Adam Roach
I find the described operation of the Custody Option in the operation of
Ping and Pong to be somewhat problematic: it allows the Pong sender to
unilaterally decide to set the Custody Option, and consequently
quarantine the Pong for an arbitrary amount of time while it processes
other operations. This seems impossible to distinguish from a
failure-due-to-timeout from the perspective of the Ping sender. Why not
limit this behavior only to Ping messages that include the Custody
Option?

I push this to Carsten, who I believe wrote this text.

Post by Adam Roach
I find the unmotivated definition of the default port for âcoaps+tcpâ to
443 â a port that is already assigned to https â to be surprising, to put
it mildly. This definitely needs motivating text, and I suspect it's
actually wrong.

If we don't do this then we do not get through firewalls.

Post by Adam Roach
I am similarly perplexed by the hard-coded âmust do ALPN *unless* the
designated port takes the magical value 5684â behavior. I donât think
Iâve ever seen a protocol that has such variation based on a hard-coded
port number, and it seems unlikely to be deployed correctly (Iâm imaging
the frustration of: âI changed both the server and the client
configuration from the default port of 5684 to 49152, and it just stopped
working. Like, literally the *only* way it works is on port 5684. I've
checked firewall settings everywhere and don't see any special handling
for that port -- I just can't figure this out, and it's driving me
crazy.â). Given the nearly universal availability of ALPN in pretty much
all modern TLS libraries, it seems much cleaner to just require ALPN
support and call it done. Or *donât* require ALPN at all and call it
done. But *changing* protocol behavior based on magic port numbers seems
like itâs going to cause a lot of operational heartburn.

It is fine for me to require ALPN always.

Post by Adam Roach
The final paragraph of section 8.1 is very confusing, making it somewhat
unclear which of the three modes must be implemented on a CoAP client,
and which must be implemented on a CoAP server. Read naÃ¯vely, this sounds
like clients are required to do only one (but one of their choosing) of
these three, while servers are required to also do only one (again, of
their choosing). It seems that the chance of finding devices that could
interoperate under such circumstances is going to be relatively low: to
work together, you would have to find a client and a server that happened
to make the same implementation choice among these three. What Iâm used
to in these kinds of cases is: (a) server must implement all, client can
choose to implement only one (or more), (b) client must implement all,
server can choose to implement only one (or more), or (c) client and
server must implement a specifically named lowest-common denominator, and
can negotiate up from there. Pretty much anything else (aside from
strange âeveryone must implement two of threeâ schemes) will end up with
interop issues.

This follows of what is being said in CoAP. Even the HTTP-spec does not
go into such a level of detail.

In terms of interoperability clearly IoT devices that implement
pre-shared secrets are not going to talk to devices that only implement
certificates. This is, however, not a real interoperability issue since
the use of CoAP will most likely be part of a device management
framework like LwM2M or stuff the OIC is working on. Companies deploying
IoT devices then need to figure out what they want to accomplish and
what security threats they care about.

As mentioned before, I personally don't see any issue with this at all
since we not shuffling CoAP over UDP on one side to CoAP over TCP on the
other side. In fact, I don't know anyone doing that.

~snip~

Ciao
Hannes

Alexey Melnikov

2017-05-09 10:21:47 UTC

Permalink

Hi Adam,

Post by Adam Roach
Adam Roach has entered the following ballot position for
draft-ietf-core-coap-tcp-tls-08: Discuss

[snip]

Post by Adam Roach
----------------------------------------------------------------------
----------------------------------------------------------------------
- Part of the document is outside the scope of the charter of the WG
which requested its publication
While I understand that this document requires a WebSockets mechanism for
.well-known, and that such a mechanism doesn’t yet exist, it seems pretty
far out of scope for the CORE working group to take on defining this
itself (unless I missed something in its charter, which is entirely
possible: it’s quite long). Specifically, I fear that this venue is
unlikely to bring such a change to the attention of those people best
positioned to comment on whether .well-known is appropriate for
WebSockets.

Mark Nottingham's review pointed out that .well-known is not registered
for ws:/wss:, so the obvious fix was to define it.
The WG working on WebSockets is closed. Carsten emailed the HYBI
mailings list (WebSocket WG mailing list) about this change:

https://www.ietf.org/mail-archive/web/hybi/current/msg10768.html

There was no objection.

I was one of the two editors of the WebSocket RFC, so I thought just
defining .well-known for WebSockets was a sensible thing to do.

Post by Adam Roach
Even if this is in scope for CORE, it really needs to be its own
document. If some future document comes along at a later point and wants
to make use of its own .well-known path with WebSockets, it would be
really quite strange to require it to reference this document in
describing .well-known for WS.

I didn't push for a separate document just for that (it will be 1 page),
but if IESG really likes a separate document we will do that.

Carsten Bormann

2017-05-10 18:31:13 UTC

Permalink

Hi Adam,

thank you for your extensive review.

Alexey has addressed your procedural DISCUSS, and I don’t have anything to add there.

I will try to make one quick round through the COMMENTs.

Post by Adam Roach
----------------------------------------------------------------------
----------------------------------------------------------------------
General — this is a very bespoke approach to what could have been mostly
solved with a single four-byte “length” header; it is complicated on the
wire, and in implementation; and the format variations among CoAP over
UDP, coap+tls, and coap+ws are going to make gateways much harder to
implement and less efficient (as they will necessarily have to
disassemble messages and rebuild them to change between formats). The
protocol itself mentions gateways in several places, but does not discuss
how they are expected to map among the various flavors of CoAP defined in
this document. Some of the changes seem unnecessary, but it could be that
I’m missing the motivation for them. Ideally, the introduction would work
harder at explaining why CoAP over these transports is as different from
CoAP over UDP as it is, focusing in particular on why the complexity of
having three syntactically incompatible headers is justified by the
benefits provided by such variations.

I think I addressed this one in the comments to Mirja.

I’m still amazed how the arrangement of the first few bytes of the header can cause so much interest.

Post by Adam Roach
Additionally, it’s not clear from the introduction what the motivation
for using the mechanisms in this document is as compared to the
techniques described in section 10 (and its subsections) of RFC 7252.

I read this as “why don’t you use HTTP when you need TCP”?
CoAP over TCP is more complex than CoAP over UDP, but still much less complex than either one of the HTTPs.

Post by Adam Roach
With the exception of subscribing to resource state (which could be
added),

A very big exception — the observe option is fundamental to many interactions with Things, and we currently don’t have a way to map this on HTTP.

Post by Adam Roach
it seems that such an approach is significantly easier to
implement and more clearly defined than what is in this document; and it
appears to provide the combined benefits of all four transports discussed
in this document. My concern here is that an explosion of transport
options makes it less likely that a client and server can find two in
common: the limit of the probability of two implementations having a
transport in common as the number of transports approaches infinity is
zero. Due to this likely decrease in interoperability, I’d expect to see
some pretty powerful motivation in here for defining a third, fourth,
fifth, and sixth way to carry CoAP when only TCP is available (I count
RFC 7252 http and https as the first and second ways in this
accounting).

These motivations are in the draft.

Post by Adam Roach
I’m also a bit puzzled that CoAP already has an inherent mechanism for
blocking messages off into chunks, which this document circumvents for
TCP connections (by allowing Max-Message-Size to be increased), and then
is forced to offer remedies for the resultant head-of-line blocking
issues. If you didn’t introduce this feature, messages with a two-byte
token add six bytes of overhead for every 1024 bytes of content — less
than 0.6% size inflation. It seems like a lot of complicated machinery —
which has a built-in foot-gun that you have to warn people about misusing
— for a very tiny gain. I know it’s relatively late in the process, but
if these trade-offs haven't had a lot of discussion yet, it’s probably
worth at least giving them some additional thought.

There are indeed different ways of handling large bodies (which occur as payload mainly as an exception in our use cases).
The current set of features was defined after OCF complained that the lock-step nature of the block protocol slowed down their firmware transfers too much; this was easy to fix my making the message size more flexible.

Post by Adam Roach
I’ll note that the entire BERT mechanism seems to fall into the same trap
of adding extra complexity for virtually nonexistent savings. CoAP
headers are, by design, tiny. It seems like a serious over-optimization
to try to eliminate them in this fashion. In particular, you’re making
the actual implementation code larger to save a trivial number of bits on
the wire; I was under the impression that many of the implementation
environments CoAP is intended for had some serious on-chip restrictions
that would point away from this kind of additional complexity.

There is a spectrum of devices out there, and some may benefit from this — performance issues for larger messages tend to come up with larger nodes.

Post by Adam Roach
Specific comments follow.
Section 3.3, paragraph 3 says that an initiator may send messages prior
to receiving the remote side’s CSM, even though the message may be larger
than would be allowed by that CSM. What should the recipient of an
oversized message do in this case? In fact, I don’t see in here what a
recipient of a message larger than it allowed for in its CSM is supposed
to do in response at *any* stage of the connection. Is it an error? If
so, how do you indicate it? Or is the Max-Message-Size option just a
suggestion for the other side? This definitely needs clarification.

Given the introduction of Max-Message-Size, there was some speculation that it would be good to allow going below the default CoAP value of 1152. In UDP CoAP, preferences of this kind are indicated in the “control usage” of the Block options, and there is an assumption that violating them will lead to suboptimal performance, not to malfunction. But we never really thought that the possibility of reducing Max-Message-Size below 1152 would motivate making the state machine more complex; maybe we should state the obvious and add the warning that indicating a smaller Max-Message-Size is no protection against receiving a message that was sent off before that new value was known to the peer.

Post by Adam Roach
(Aside — it seems odd and somewhat backwards that TCP connections are
provided an affordance for fine-grained control over message sizes, while
UDP communications are not.)

Connections provide a context for performing this control; for transports that don’t have connections, it is less clear what the control state should be attached to. (Certainly not four-tuples.) So far, we have handled DTLS like UDP; theoretically, we could use CSM-style messages in DTLS, but that hasn’t been explored yet (and so far I’m not aware of interest in changing CoAP over DTLS to make use of this theoretical possibility).

Successful TCP keepalives are usually not visible to the application, so they are not quite in the same league as CoAP PING/PONG or WebSocket mechanism. This is a SHOULD NOT because there may be reasons why the WebSocket mechanism might be the one to use; the CoAP-level PING/PONG are closer to the application and provide some CoAP-specific functionality (such as Custody).

Post by Adam Roach
(I’ll also note that the presence of
keepalive mechanisms in both TCP and WebSockets would seem to make the
addition of new CoAP primitives for the same purpose unnecessary, but I
suspect this has already been debated).

One problem with such mechanisms is that they are often buried in layers that are hard to access/control, so it ay be simpler to solve the problem at the application layer. (This is a bit of a TLS vs. IPsec argument.)

Post by Adam Roach
Section 5 and its subsections define a new set of message types,
presumably for use only on connection-oriented protocols, although this
is only implied, and never stated.

Now https://github.com/core-wg/coap-tcp-tls/issues/152

Post by Adam Roach
For example, some implementors may see
CSM, Ping, and Pong as potentially useful in UDP; and, finding no
prohibition in this document against using them, decide to give it a go.
Is that intended? If not, I strongly suggest an explicit prohibition
against using these in UDP contexts.

Yes.

Well, they may not want to disclose this capability…
What the text probably should say is that if they want to make this capability available to the peer they MUST indicate it.

Now https://github.com/core-wg/coap-tcp-tls/issues/153

Post by Adam Roach
It seems odd that this document
would define a way to communicate this, and then choose to leave the
communicated options as “YES” and “YOUR GUESS IS AS GOOD AS MINE” rather
than the simpler and more useful “YES” and “NO”.

Indeed.

Oh. The intention was that the PINGer gives permission to delay with the Custody option.
I now realize that the text isn’t clear about that.

s/Unless there is an
option with delaying semantics/Unless the PING carries an
option with delaying semantics/

Now https://github.com/core-wg/coap-tcp-tls/issues/154

Post by Adam Roach
This seems impossible to distinguish from a
failure-due-to-timeout from the perspective of the Ping sender. Why not
limit this behavior only to Ping messages that include the Custody
Option?

(Or a similar future option.)

Post by Adam Roach
I find the unmotivated definition of the default port for “coaps+tcp” to
443 — a port that is already assigned to https — to be surprising, to put
it mildly. This definitely needs motivating text, and I suspect it's
actually wrong.

That is possible because of the way ALPN works; it seems that 443 is the best choice for actually getting connectivity.

Post by Adam Roach
I am similarly perplexed by the hard-coded “must do ALPN *unless* the
designated port takes the magical value 5684” behavior. I don’t think
I’ve ever seen a protocol that has such variation based on a hard-coded
port number, and it seems unlikely to be deployed correctly (I’m imaging
the frustration of: “I changed both the server and the client
configuration from the default port of 5684 to 49152, and it just stopped
working. Like, literally the *only* way it works is on port 5684. I've
checked firewall settings everywhere and don't see any special handling
for that port -- I just can't figure this out, and it's driving me
crazy.”). Given the nearly universal availability of ALPN in pretty much
all modern TLS libraries, it seems much cleaner to just require ALPN
support and call it done. Or *don’t* require ALPN at all and call it
done. But *changing* protocol behavior based on magic port numbers seems
like it’s going to cause a lot of operational heartburn.

Interesting scenario.

The current rules are an attempt to use ALPN as it is the future, but also allow environments (which were specifically cited, but I forget which ones they were) without ALPN to play. I think that objective is worth some complexity, but I’ve opened an issue to discuss this nonetheless.

Now https://github.com/core-wg/coap-tcp-tls/issues/155

Post by Adam Roach
The final paragraph of section 8.1 is very confusing, making it somewhat
unclear which of the three modes must be implemented on a CoAP client,

The entirety of CoAP over TCP is optional for a CoAP client or server.

Post by Adam Roach
and which must be implemented on a CoAP server. Read naïvely, this sounds
like clients are required to do only one (but one of their choosing) of
these three, while servers are required to also do only one (again, of
their choosing).

Generally, this is determined by what kind of security workflows are used; writing down a preference here will have little impact.

EKR has pointed out that we need to make explicit that we want to deviate from the TLS 1.2 MTI for constrained-to-cloud, so some changes will be required here. See also https://github.com/core-wg/coap-tcp-tls/issues/145

Post by Adam Roach
It seems that the chance of finding devices that could
interoperate under such circumstances is going to be relatively low: to
work together, you would have to find a client and a server that happened
to make the same implementation choice among these three. What I’m used
to in these kinds of cases is: (a) server must implement all, client can
choose to implement only one (or more), (b) client must implement all,
server can choose to implement only one (or more), or (c) client and
server must implement a specifically named lowest-common denominator, and
can negotiate up from there. Pretty much anything else (aside from
strange “everyone must implement two of three” schemes) will end up with
interop issues.

The server and client roles are not really defined very well here, as either side can use a connection either way once it has been set up.

My advice would be: If you want interoperability, use CoAP over UDP with the security model that you have a security workflow for.
If there are operational restrictions making this impossible, respond to those operational restrictions.

Post by Adam Roach
Although the document clearly expects the use of gateways and proxies
between these connection-oriented usages of CoAP and UDP-based CoAP,
Appendix A seems to omit discussion or consideration of how this
gatewaying can be performed. The following list of problems is
illustrative of this larger issue, but likely not exhaustive. (I'll note
that all of these issues evaporate if you move to a simpler scheme that
merely frames otherwise unmodified UDP CoAP messages)
Section A.1 does not indicate what gateways are supposed to do with
out-of-order notifications. The TCP side requires these to be delivered
in-order; so, do this mean that gateways observing a gap in sequence
numbers need to quarantine the newly received message so that it can
deliver the missing one first? Or does it deliver the newly-received
message and then discard the “stale” one when it arrives? I don’t think
that leaving this up to implementations is particularly advisable.

Yes, we had that discussion on the list. There is little point in trying to do anything here but the latter.
We could state that more explicitly here, or leave that to implementers’ advice documents such as draft-ietf-lwig-coap (we’ll need to rework section 6 of that now anyway).

Post by Adam Roach
Section A.3 is a bit more worrisome. I understand the desired
optimization here, but where you reduce traffic in one direction, you run
the risk of exploding it in the other. For example, consider a coap+tcp
client connecting to a gateway that communicates with a CoAP-over-UDP
server. When that client wants to check the health of its observations,
it can send a Ping and receive a Pong that confirms that they are all
alive and well. In order to be able to send a Pong that *means* “all your
observations are alive and well,” the gateway has to verify that all the
observations are alive and well.

Oh. The PONG says the proxy hasn’t crashed, so the observations are alive and well in the proxy. How that makes sure it gets fresh data from the next hop (by observe, by polling) is up to the proxy. There is no intention that a PING suddenly makes a proxy do a check, when it otherwise has ignored its duty to check that before.

Post by Adam Roach
A simple implementation of a gateway
will likely check on each observed resource individually when it gets a
Ping, and then send a Pong after it hears back about all of them. So, as
a client, I can set up, let’s say, two dozen observations through this
gateway. Then, with each Ping I send, the gateway sends two dozen checks
towards the server. This kind of message amplification attack is an
awesome way to DoS both the gateway and the server. I believe the
document needs a treatment of how UDP/TCP gateways handle notification
health checks, along with techniques for mitigating this specific
attack.

See above. We probably need to say that PINGs check the connection (and thus the fate-sharing observation state) but are not meant to make proxies frantically check the observation relationships to their data sources. Now https://github.com/core-wg/coap-tcp-tls/issues/156

Post by Adam Roach
Section A.4 talks about the rather different ways of dealing with
unsubscribing from a resource. Presumably, gateways that get a reset to a
notification are expected to synthesize a new GET to deregister on behalf
of the client?

A proxy that translates from a UDP client to TCP server may want to deregister if the client sends a Reset. Or not, of the proxy has other clients that want this data (or if it is configured to watch that TCP server).

Post by Adam Roach
Or is it okay if they just pass along the reset,

In that scenario, there is no way to pass on the Reset, as there are no resets over the TCP connection.
(Well, you could close the TCP connection :-)

Post by Adam Roach
and
expect the server to know that it means the same thing as a
deregistration? Without explicit guidance here, I expect server and
gateway implementors to make different choices and end up with a lack of
interop.

A UDP CoAP server needs to handle resets on notifications, so it cannot make a choice here.
A TCP CoAP server never can see a reset, so it only has to handle the explicit deregistration (or a connection teardown).
The proxy can make a choice to perform explicit deregistration to a UDP CoAP server or send a Reset on the next notification; I would generally assume that proxies will use explicit deregistration.
Food for section 6 of draft-ietf-lwig-coap, I’d say.

Post by Adam Roach
** There is 1 instance of too long lines in the document, the longest one
being 3 characters in excess of 72.

Thanks, now https://github.com/core-wg/coap-tcp-tls/issues/157

Again, thank you for this extensive review — getting feedback from someone who hasn’t followed the work since its start really can open one’s eyes on how a document can be confusing.

Grüße, Carsten

Adam Roach

2017-05-10 19:46:24 UTC

Permalink

I'm starting a new thread on this one issue in particular because it has
much larger architectural implications for the future of the Internet at
large.

Post by Carsten Bormann

That is possible because of the way ALPN works; it seems that 443 is the best choice for actually getting connectivity.

I understand that it's *possible*, but that hardly seems to make it
*advisable*.

At the moment, the primary component in the design of most of our layer
4 protocols -- including TCP -- for distinguishing among different
services is the port number. The way ALPN is currently used is for
distinguishing among protocols on a single port that fundamentally
accomplish the same thing. Users of "https" URLs use it to switch among
HTTP, SPDY, and H2. Users of "stun" URLs use it to switch between relay
server functionality and NAT discovery functionality. Users of WebRTC
datachannels use it to differentiate between confidential and
non-confidential media.

All of those are basically using variations of the same protocol on a
single port.

You're proposing a fairly large departure from that, in that CoAP --
while inspired by HTTP and its relatives -- is really a different thing
than what port 443 is assigned for, and the argument being offered is
that port 443 tends to work through firewalls better than other ports.
Allowing a default port of 443 for coaps+tcp would mark the moment where
we first assigned overlapping default ports to different protocols for
the purposes of circumventing firewall policies that -- through
intention or misconfiguration -- prevent the use of a unique assigned
port. Taken to its logical end, all future TLS-using protocols could
equally claim a default port of 443 on the basis that:

1. It works through firewalls better, and
2. ALPN makes it possible

This list seems to be a complete accounting of your rationale for using
a default port 443 for CoAP; correct me if I'm wrong.

If we're going to take the first steps down this path, I want them to be
made deliberately and after due consideration of the consequences. If we
decide that we really need to evaluate such an approach, I have a number
of concrete, severely detrimental real-world consequences that would
ensue from allowing the generalized assignment of port 443 to non-HTTPS
protocols. I'm not raising them here because I think enough people will
find the decision to move differentiation among unrelated protocols from
port numbers to ALPN IDs sufficiently architecturally distasteful that
debating the specific consequences will be unnecessary.

/a

Adam Roach

2017-05-10 22:22:19 UTC

Permalink

Post by Hannes Tschofenig
Hi Adam,
thank you for your extensive review.
Alexey has addressed your procedural DISCUSS, and I donât have anything to add there.
I will try to make one quick round through the COMMENTs.

Post by Adam Roach
----------------------------------------------------------------------
----------------------------------------------------------------------
General â this is a very bespoke approach to what could have been mostly
solved with a single four-byte âlengthâ header; it is complicated on the
wire, and in implementation; and the format variations among CoAP over
UDP, coap+tls, and coap+ws are going to make gateways much harder to
implement and less efficient (as they will necessarily have to
disassemble messages and rebuild them to change between formats). The
protocol itself mentions gateways in several places, but does not discuss
how they are expected to map among the various flavors of CoAP defined in
this document. Some of the changes seem unnecessary, but it could be that
Iâm missing the motivation for them. Ideally, the introduction would work
harder at explaining why CoAP over these transports is as different from
CoAP over UDP as it is, focusing in particular on why the complexity of
having three syntactically incompatible headers is justified by the
benefits provided by such variations.

I think I addressed this one in the comments to Mirja.
Iâm still amazed how the arrangement of the first few bytes of the header can cause so much interest.

It's not so much the specific arrangement itself as much as the creation
of three mutually incompatible versions of the arrangement that is
causing me heartburn. I'd be much happier with something baroque like
little-endian byte packing than with so many variations on a theme. I'm
seeing that some implementations will have to have three different
parsers (or equivalent complexity in terms of alternate code paths) and
three different serializers if they're going to implement all three
variations. That is very unfriendly to developers and testers alike.