encoding gestion in PEtAlS

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

encoding gestion in PEtAlS

cdeneux-4
The character encoding management in PEtALS works now as the following:
- incoming messages are translated to UTF-8 when they arrive in the bus, independently their original format.
- outgoing messages are always encoded in UTF-8 because of the preceding translation.
- only the attachments keep their original encoding.

This policy could be improved, in order to allow users to specify some encoding policy such as:
- specify an output encoding for a specific SU.
- specify the output message encoding is the same as the incoming message encoding.
- any other interesting policy...

We can imagine that these policies are implemented in the components (by detecting the encoding of the encoding messages, registering it in meta-data, and interpreting it at the output of the bus), or are implemented in the kernel or the cdk.

Has anybody an idea on the question?

PS: Some work has been done on the file transfer component to implement it at the component level, but not committed...




-------------------- m2f --------------------

Read this forum topic online here:
http://petals.ebmwebsourcing.com/forum/viewtopic.php?p=322#322

-------------------- m2f --------------------

_______________________________________________
General mailing list
[hidden email]
http://forum-list.petalslink.org/cgi-bin/mailman/listinfo/general
Reply | Threaded
Open this post in threaded view
|

Re: encoding gestion in PEtAlS

cdeneux-4
My idea is to use the encoding specified in the XML document…

As transformers don’t set correctly the encoding (we have to set the encoding as a parameter of the transformer), an extended Transformer (a Transformer wrapper) reading the first bytes of the document () to know the encoding, and set it on the real Transformer used before asking it the effective transformation.

If a user wants to change the encoding, for some obscured reasons, the XSLT Component can do it, or an Interceptor.

I don’t think this case might happen a lot of time, so I don’t think another CDK/Container property is really needed, there is today enough properties that are not very used or known by PEtALS users..

Adrien




-------------------- m2f --------------------

Read this forum topic online here:
http://petals.ebmwebsourcing.com/forum/viewtopic.php?p=323#323

-------------------- m2f --------------------

_______________________________________________
General mailing list
[hidden email]
http://forum-list.petalslink.org/cgi-bin/mailman/listinfo/general
Reply | Threaded
Open this post in threaded view
|

Re: encoding gestion in PEtAlS

cdeneux-4
In reply to this post by cdeneux-4
My idea is to use the encoding specified in the XML document…

As transformers don’t set correctly the encoding (we have to set the encoding as a parameter of the transformer), an extended Transformer (a Transformer wrapper) reading the first bytes of the document defining the encoding to know it, and set it on the real Transformer used before asking it the effective transformation.

If a user wants to change the encoding, for some obscured reasons, the XSLT Component can do it, or an Interceptor.

I don’t think this case might happen a lot of time, so I don’t think another CDK/Container property is really needed, there is today enough properties that are not very used or known by PEtALS users..

Adrien




-------------------- m2f --------------------

Read this forum topic online here:
http://petals.ebmwebsourcing.com/forum/viewtopic.php?p=324#324

-------------------- m2f --------------------

_______________________________________________
General mailing list
[hidden email]
http://forum-list.petalslink.org/cgi-bin/mailman/listinfo/general
Reply | Threaded
Open this post in threaded view
|

Re: encoding gestion in PEtAlS

cdeneux-4
In reply to this post by cdeneux-4
Maybe we should investigate on why the DOM or SAX parsers (Xerces, saxon?) doesn't return the correct encoding before trying to rewrite a wrapper on them.

For me, using a property on the MessageExchange which can be taken account by each device working with XML data is the best configurable way.
Some device (the BC receiving the incomings flow) can set the property according to its internal device (transformer, DocumentBuilder, stream transfer...) and then the property is passed though messageExchange (EIP/orchestra) and each component can use it or reset it to any value if he want.
When the flow comes to the end (outcoming flow), the last BC component use this property to set the outgoing encoding if necessary.

I think it is important to be able to configure it, as when we distribute PEtALS amongst different system, we never know the system encoding or the wanted encoding of external partners.




-------------------- m2f --------------------

Read this forum topic online here:
http://petals.ebmwebsourcing.com/forum/viewtopic.php?p=325#325

-------------------- m2f --------------------

_______________________________________________
General mailing list
[hidden email]
http://forum-list.petalslink.org/cgi-bin/mailman/listinfo/general
Reply | Threaded
Open this post in threaded view
|

Re: encoding gestion in PEtAlS

cdeneux-4
In reply to this post by cdeneux-4
Roland, I agree with you, the first thing to do is to search why the parsers don’t return the right encoding
Have you some business cases in mind where the user wants to change the encoding defined in its XML document?




-------------------- m2f --------------------

Read this forum topic online here:
http://petals.ebmwebsourcing.com/forum/viewtopic.php?p=326#326

-------------------- m2f --------------------

_______________________________________________
General mailing list
[hidden email]
http://forum-list.petalslink.org/cgi-bin/mailman/listinfo/general
Reply | Threaded
Open this post in threaded view
|

Re: encoding management in PEtAlS

cdeneux-4
In reply to this post by cdeneux-4
Of course not Shocked

It's just my idea of it, but some old system/programs may accept only some encoding.

An other point is the UTF-8 is great for the ASCII characters, but if the data transferred is chineese eg, the XML document may be a lot larger that it could have been in a UTF-16 or i don't now else encoding.
So for some partners, the encoding may be changed specifically.




-------------------- m2f --------------------

Read this forum topic online here:
http://petals.ebmwebsourcing.com/forum/viewtopic.php?p=327#327

-------------------- m2f --------------------

_______________________________________________
General mailing list
[hidden email]
http://forum-list.petalslink.org/cgi-bin/mailman/listinfo/general
Reply | Threaded
Open this post in threaded view
|

Re: encoding management in PEtAlS

cdeneux-4
In reply to this post by cdeneux-4
I agree with you: the kernel must use the encoding specified in the XML document.

By default Transformer and Document use the default encoding of the operating system. So when a message is set from a French Windows machine to a English Linux machine, we can have encoding tranformation.


Moreover, the CDK API using String to hold message content should be prohibited.




-------------------- m2f --------------------

Read this forum topic online here:
http://petals.ebmwebsourcing.com/forum/viewtopic.php?p=328#328

-------------------- m2f --------------------

_______________________________________________
General mailing list
[hidden email]
http://forum-list.petalslink.org/cgi-bin/mailman/listinfo/general