[core] Mirja Kühlewind's Discuss on draft-ietf-core-coap-tcp-tls-08: (with DISCUSS)

Discussion:

Mirja Kühlewind

2017-05-10 13:40:31 UTC

Mirja Kühlewind has entered the following ballot position for
draft-ietf-core-coap-tcp-tls-08: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)

Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-core-coap-tcp-tls/

----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

1) My general concern is that, while I don't necessarily want to block
the proposed format, I would like to understand further before
publication why this approach was chosen. Similar to Ben's discuss, I
don't understand why the format was chosen so differently. You could just
use the format (plus a new length option) as defined for UPD and just
never have any retransmission or reordering but be more flexible on the
lower layer transport to use. However, if you actually prefer a new
format (to save space), than that sounds like a new version for me, while
the draft says:
"CoAP is defined in [RFC7252] with a version number of 1. At this time,
there is no known reason to support version numbers different from 1."
However, in this case it could even have made sense to define a new
format/version that could be used for both underlying protocols and
either have a length option or a message type and IP option.
Further I also don't understand why on the other hand the TCP COAP
framing is re-used for websockets because websockets already provides
message framing and a length field.

Also inline with Ben's discuss, the use of the Block option for CAOP/TCP
is not very clear to me. The draft says:
"a UDP-to-TCP gateway may simply not have the context to convert a
message with a Block Option into the equivalent exchange without
any use of a Block Option (it would need to convert the entire
blockwise exchange from start to end into a single exchange)"
However, given that the COAP/TCP and COAP/UDP format are so different,
it's anyway a more complex conversion than just sticking another
transport underneath. The argument for HOL blocking due to e.g. upgrades
is also not clear to me because you should probably better just use a
different TCP connection for that as it really seems to be a different
use case.

For me this draft looks like you are defining basically a new protocol
version and not just COAP over TCP.

Again, I don't necessarily want to block this but I would like to
understand why the proposed approach was chosen.

2) Comments from the tsv-art review needs to be addressed as well (Thanks
to Yoshi Nishida for the review!). Here is the review text for your
connivence:

"Summary: This document is well-written. It is almost ready to be
published
as a PS draft once the following points are addressed.

1: It is not clear how the protocol reacts the errors from transport
layers
(e.g. connection failure).
The protocol will just inform apps of the events and the app will
decide
what to do or the protocol itself will do something?

2: There will be situations where the app layer is freezing while the
transport layer is still working. Since transport layers cannot detect
this
type of failures, there should be some mechanisms for it somewhere in
the
protocol or in the app layer. The doc needs to address this point. For
example, what will happen when a PONG message is not returned for a
certain
amount of time?

3: Since this draft defines new SZX value, I think the doc needs to
update
RFC7959. This point should be clarified more in the doc.“

3) And inline with Yoshi's comment, I don't think this part in section
3.3 is well specified; especially I don't understand how these two thing
fit together:
"To avoid unnecessary latency, a Connection Initiator MAY send
additional messages without waiting to receive the Connection
Acceptor's CSM; ..."
and
"Endpoints MUST treat a missing or invalid CSM as a connection error
and abort the connection (see Section 5.6)."
Also how long should I wait until I abort the connection?

Carsten Bormann

2017-05-10 14:28:46 UTC

Permalink

Hi Mirja,

sorry for answering the IESG input out of order, but since you mostly have questions, I feel I should answer them quickly.

Post by Mirja KÃ¼hlewind
----------------------------------------------------------------------
----------------------------------------------------------------------
1) My general concern is that, while I don't necessarily want to block
the proposed format, I would like to understand further before
publication why this approach was chosen. Similar to Ben's discuss, I
don't understand why the format was chosen so differently. You could just
use the format (plus a new length option) as defined for UPD and just
never have any retransmission or reordering but be more flexible on the
lower layer transport to use.

Post by Mirja KÃ¼hlewind
However, if you actually prefer a new
format (to save space), than that sounds like a new version for me, while
"CoAP is defined in [RFC7252] with a version number of 1. At this time,
there is no known reason to support version numbers different from 1.”

But there is no need to change the UDP format, it will continue to look the same.

Post by Mirja KÃ¼hlewind
However, in this case it could even have made sense to define a new
format/version that could be used for both underlying protocols and
either have a length option or a message type and IP option.

We don’t need a length on UDP, and we don’t need Ver/T/MID on TCP.
So that is the difference in the structure of the first four bytes, after which the formats are identical.

Post by Mirja KÃ¼hlewind
Further I also don't understand why on the other hand the TCP COAP
framing is re-used for websockets because websockets already provides
message framing and a length field.

After taking out the length (which we don’t need on websockets), as well, we have a four-bit gap.
Using the TCP format and just keeping what was the length field MBZ was the easiest way to handle this.

Post by Mirja KÃ¼hlewind
Also inline with Ben's discuss, the use of the Block option for CAOP/TCP
"a UDP-to-TCP gateway may simply not have the context to convert a
message with a Block Option into the equivalent exchange without
any use of a Block Option (it would need to convert the entire
blockwise exchange from start to end into a single exchange)"
However, given that the COAP/TCP and COAP/UDP format are so different,
it's anyway a more complex conversion than just sticking another
transport underneath.

The (pretty much trivial) format conversion is not a source of complexity, the handling of the state machines is.
But that is not a new thing for a proxy at all.

Post by Mirja KÃ¼hlewind
The argument for HOL blocking due to e.g. upgrades
is also not clear to me because you should probably better just use a
different TCP connection for that as it really seems to be a different
use case.

One could do that (often requiring the cloud-based component to incite the constrained device to set up a new connection), or one can simply use the established Block protocol.

Post by Mirja KÃ¼hlewind
For me this draft looks like you are defining basically a new protocol
version and not just COAP over TCP.

As described above, almost all of the protocol is the same.

(The pictures are probably a bit misleading, as they don’t show the juicy parts of the protocol, which are in the code and options processing.)

Post by Mirja KÃ¼hlewind
2) Comments from the tsv-art review needs to be addressed as well (Thanks
to Yoshi Nishida for the review!). Here is the review text for your
"Summary: This document is well-written. It is almost ready to be
published
as a PS draft once the following points are addressed.
1: It is not clear how the protocol reacts the errors from transport
layers
(e.g. connection failure).
The protocol will just inform apps of the events and the app will
decide
what to do or the protocol itself will do something?

Indeed, the protocol does not define what to do.
(This is similar to other application protocols pn top of TCP, say, HTTP or FTP.)

We probably should be more explicit about that fact.

Post by Mirja KÃ¼hlewind
2: There will be situations where the app layer is freezing while the
transport layer is still working. Since transport layers cannot detect
this
type of failures, there should be some mechanisms for it somewhere in
the
protocol or in the app layer. The doc needs to address this point. For
example, what will happen when a PONG message is not returned for a
certain
amount of time?

Again, that is in the hand of the application.

Post by Mirja KÃ¼hlewind
3: Since this draft defines new SZX value, I think the doc needs to
update
RFC7959. This point should be clarified more in the doc.“

We will follow what IESG recommends on the question whether the new document “updates” (technical term) RFC 7959.

Post by Mirja KÃ¼hlewind
3) And inline with Yoshi's comment, I don't think this part in section
3.3 is well specified; especially I don't understand how these two thing
"To avoid unnecessary latency, a Connection Initiator MAY send
additional messages without waiting to receive the Connection
Acceptor's CSM; ..."
and
"Endpoints MUST treat a missing or invalid CSM as a connection error
and abort the connection (see Section 5.6)."
Also how long should I wait until I abort the connection?

Mirja Kühlewind

2017-05-10 15:13:46 UTC

Permalink

Hi Carsten,

thanks for your quick reply. I quickly just what to add something to this
point, providing a long reply at a later point of time.

Post by Carsten Bormann
The (pretty much trivial) format conversion is not a source of complexity, the handling of the state machines is.

If you keep the same format you can also keep the machinery, it's just that
if you use TCP underneath you will never need to make a retransmission or any
reordering. However, with your protocol changes you basically need a complete
new machinery.

Mirja

Jim Schaad

2017-05-10 15:43:46 UTC

Permalink

Some of my opinions inline.

Jim

-----Original Message-----
From: core [mailto:core-***@ietf.org] On Behalf Of Carsten Bormann
Sent: Wednesday, May 10, 2017 7:29 AM
To: Mirja Kühlewind <***@kuehlewind.net>
Cc: core-***@ietf.org; The IESG <***@ietf.org>; ***@ietf.org; draft-ietf-core-coap-tcp-***@ietf.org
Subject: Re: [core] Mirja Kühlewind's Discuss on draft-ietf-core-coap-tcp-tls-08: (with DISCUSS)

Hi Mirja,

sorry for answering the IESG input out of order, but since you mostly have questions, I feel I should answer them quickly.

Post by Mirja KÃ¼hlewind
----------------------------------------------------------------------
----------------------------------------------------------------------
1) My general concern is that, while I don't necessarily want to block
the proposed format, I would like to understand further before
publication why this approach was chosen. Similar to Ben's discuss, I
don't understand why the format was chosen so differently. You could
just use the format (plus a new length option) as defined for UPD and
just never have any retransmission or reordering but be more flexible
on the lower layer transport to use.

The CoAP (four-byte) fixed header is to a large extent concerned with the UDP-based reliability layer of UDP CoAP.
Keeping around fields that have lost their function is a recipe for interoperability issues.

Let’s go through the UDP format:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| T | TKL | Code | Message ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Token (if any, TKL bytes) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options (if any) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 1 1 1 1 1 1 1| Payload (if any) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The parts that need to change:

Ver is no longer needed, as it doesn’t need to be per message on a reliable transport.

[JLS] The reason that Ver is no longer needed has nothing to do with the fact that this is a reliable transport. It has to do with the fact that you have decided to setup up multiple streams and assign one version number per stream. The same thing could potentially be done with UDP as well. I think it would have made more sense to keep the version field and allow multiple versions to in a single stream just like is done for UDP.

T is about reliability models; we don’t need that on TCP, which imposes its own reliability model.

[JLS] While this is true, the fact that there is no discussion on what should be happening when you change reliability models for a message in transit is something that I have always found troublesome. Additionally, there are some semantics about these which are not completely related to reliability. It is possible to send messages across for which it is reasonable that a server not ever respond back if the request contains an error or would have an empty answer. This ability has been lost.

Message ID also is about retransmissions and protection from packet duplication, which don’t exist on TCP.

[JLS] On this I agree - Message ID has always been about a single hop.

The majority of the format is unchanged:
TKL, Code, Token, Options, Payload Marker (0xFF), Payload are unchanged.

Now add a length for framing. The result:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Len | TKL | Code | Token (if any, TKL bytes) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options (if any) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 1 1 1 1 1 1 1| Payload (if any) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Maybe we should have formatted this as:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Len | TKL | Code | Length extension (if any)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Token (if any, TKL bytes) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options (if any) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 1 1 1 1 1 1 1| Payload (if any) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Post by Mirja KÃ¼hlewind
However, if you actually prefer a new
format (to save space), than that sounds like a new version for me,
"CoAP is defined in [RFC7252] with a version number of 1. At this
time, there is no known reason to support version numbers different from 1.”

But there is no need to change the UDP format, it will continue to look the same.

We don’t need a length on UDP, and we don’t need Ver/T/MID on TCP.
So that is the difference in the structure of the first four bytes, after which the formats are identical.

Post by Mirja KÃ¼hlewind
Also inline with Ben's discuss, the use of the Block option for
"a UDP-to-TCP gateway may simply not have the context to convert a
message with a Block Option into the equivalent exchange without
any use of a Block Option (it would need to convert the entire
blockwise exchange from start to end into a single exchange)"
However, given that the COAP/TCP and COAP/UDP format are so different,
it's anyway a more complex conversion than just sticking another
transport underneath.

The (pretty much trivial) format conversion is not a source of complexity, the handling of the state machines is.
But that is not a new thing for a proxy at all.

Post by Mirja KÃ¼hlewind
The argument for HOL blocking due to e.g. upgrades is also not clear
to me because you should probably better just use a different TCP
connection for that as it really seems to be a different use case.

One could do that (often requiring the cloud-based component to incite the constrained device to set up a new connection), or one can simply use the established Block protocol.

Post by Mirja KÃ¼hlewind
For me this draft looks like you are defining basically a new protocol
version and not just COAP over TCP.

Post by Mirja KÃ¼hlewind
2) Comments from the tsv-art review needs to be addressed as well
(Thanks to Yoshi Nishida for the review!). Here is the review text for
your
"Summary: This document is well-written. It is almost ready to be
published as a PS draft once the following points are addressed.
1: It is not clear how the protocol reacts the errors from transport
layers (e.g. connection failure).
The protocol will just inform apps of the events and the app will
decide what to do or the protocol itself will do something?

Indeed, the protocol does not define what to do.
(This is similar to other application protocols pn top of TCP, say, HTTP or FTP.)

We probably should be more explicit about that fact.

Post by Mirja KÃ¼hlewind
2: There will be situations where the app layer is freezing while the
transport layer is still working. Since transport layers cannot detect
this type of failures, there should be some mechanisms for it
somewhere in the protocol or in the app layer. The doc needs to
address this point. For example, what will happen when a PONG message
is not returned for a certain amount of time?

Again, that is in the hand of the application.

Post by Mirja KÃ¼hlewind
3: Since this draft defines new SZX value, I think the doc needs to
update RFC7959. This point should be clarified more in the doc.“

We will follow what IESG recommends on the question whether the new document “updates” (technical term) RFC 7959.

Post by Mirja KÃ¼hlewind
3) And inline with Yoshi's comment, I don't think this part in section
3.3 is well specified; especially I don't understand how these two
"To avoid unnecessary latency, a Connection Initiator MAY send
additional messages without waiting to receive the Connection
Acceptor's CSM; ..."
and
"Endpoints MUST treat a missing or invalid CSM as a connection error
and abort the connection (see Section 5.6)."
Also how long should I wait until I abort the connection?

I think that this is similar to:
How long does an HTTP server wait for the first request?

Pointing out that this is all in the hands of the application is probably a good editorial improvement.
Now https://github.com/core-wg/coap-tcp-tls/issues/149

Grüße, Carsten

_______________________________________________
core mailing list
***@ietf.org
https://www.ietf.org/mailman/listinfo/core

Carsten Bormann

2017-05-10 15:56:43 UTC

Permalink

Post by Carsten Bormann
Ver is no longer needed, as it doesn’t need to be per message on a reliable transport.
[JLS] The reason that Ver is no longer needed has nothing to do with the fact that this is a reliable transport. It has to do with the fact that you have decided to setup up multiple streams and assign one version number per stream. The same thing could potentially be done with UDP as well. I think it would have made more sense to keep the version field and allow multiple versions to in a single stream just like is done for UDP.

Fundamentally, version numbers are overrated. CoAP versions are not for counting through small behavior changes.
I struggle to find a reason to mix multiple format versions on one stream. CoAP-TCP is doing the right thing here.

Post by Carsten Bormann
T is about reliability models; we don’t need that on TCP, which imposes its own reliability model.
[JLS] While this is true, the fact that there is no discussion on what should be happening when you change reliability models for a message in transit is something that I have always found troublesome. Additionally, there are some semantics about these which are not completely related to reliability. It is possible to send messages across for which it is reasonable that a server not ever respond back if the request contains an error or would have an empty answer. This ability has been lost.

Well, that is a comment on CoAP, not on the present specification. We don’t have a way to indicate what reliability model we expect a proxy to use for its forwarded request. That is usually simply not for the client to decide. There are some approaches that seem to imply that a UDP proxy must use exactly the same reliability model for the forwarded request as was used for the client request, and that may actually be a good implementation strategy if you don’t want to address the issue more throughly. But a Non-confirmable request coming in from an on-link client on an Ethernet has a quite different reliability to it than even a Confirmable request on a very flaky low-power wireless link.

(Other standards in this space have “QoS classes”, which I don’t even want to start to berate here.)

Should a client be able to instruct a proxy about the next-hop reliability? If yes, there is more to this than the NON/CON distinction, and we should do a real solution for this. Not here.

Grüße, Carsten

Adam Roach

2017-05-10 16:24:50 UTC

Permalink

Post by Carsten Bormann
The (pretty much trivial) format conversion is not a source of complexity, the handling of the state machines is.
But that is not a new thing for a proxy at all.

You state this like the complexity of adapting two different state
machines to each other is unavoidably inherent to a proxy that performs
these kinds of transport conversions; but it's not.

The only reason this complexity arises for CoAP is because the design in
this document requires it. If you simply encapsulated UDP over TCP with
simple framing, and left handing of reordering and deduplication
identical to what currently exists for CoAP over UDP, then the proxy
could shuttle messages from one side to the other, with the only needed
state being that directly related to the TCP connection. (And if you
left in the fields that you find unnecessary in TCP, it would be able to
do it with far fewer memory copies, which would make for significantly
better scalability; and, as a bonus, you wouldn't need multiple sets of
parsing and marshalling code in endpoints that support both UDP and TCP).

Proxies between different transports don't *have* to be complicated.

/a

Carsten Bormann

2017-05-10 16:38:20 UTC

Permalink

You state this like the complexity of adapting two different state machines to each other is unavoidably inherent to a proxy that performs these kinds of transport conversions; but it's not.
The only reason this complexity arises for CoAP is because the design in this document requires it. If you simply encapsulated UDP over TCP with simple framing, and left handing of reordering and deduplication identical to what currently exists for CoAP over UDP, then the proxy could shuttle messages from one side to the other, with the only needed state being that directly related to the TCP connection. (And if you left in the fields that you find unnecessary in TCP, it would be able to do it with far fewer memory copies, which would make for significantly better scalability; and, as a bonus, you wouldn't need multiple sets of parsing and marshalling code in endpoints that support both UDP and TCP).
Proxies between different transports don't *have* to be complicated.

There are indeed simple cases where UDP-to-UDP proxies can be implemented as UDP payload forwarders.
Unfortunately, things become a bit more complex as soon as the proxy has some fan-in (more than one client), or actually wants to provide some function such as caching.

For TCP, the WG decided we wanted to use TCP's reliability features instead of re-using our own on top of using TCP just as a datagram forwarding mechanism. This favors the simplicity of an end-system over that of certain simple cases of a proxy. (Which is in line with other design decisions we have made about allotting complexity to different places in the architecture.)

Grüße, Carsten

Mirja Kühlewind

2017-05-10 16:49:18 UTC

Permalink

Post by Carsten Bormann
For TCP, the WG decided we wanted to use TCP's reliability features instead of re-using our own on top of using TCP just as a datagram forwarding mechanism. This favors the simplicity of an end-system over that of certain simple cases of a proxy. (Which is in line with other design decisions we have made about allotting complexity to different places in the architecture.)

You cannot use TCP without it reliability features. Having additional
reliability in the higher layer simply means the reliability is not used.

Carsten Bormann

2017-05-10 19:05:45 UTC

Permalink

You cannot use TCP without it reliability features. Having additional reliability in the higher layer simply means the reliability is not used.

… or that the features are suddenly viewed as conveying application (or even end-to-end) reliability.
Much safer to remove them.

Grüße, Carsten