GSoC 2019: Haskell (Part-II)
Welcome!
Finally, it’s time to wrap up and finalize the project. Here is my GSoC project HsYAML.
Undoubtedly, it was the best summer.
A big THANK YOU shout out to my mentors Herbert Valerio Riedel and Michał J. Gajda.
Without them, it would have been impossible for me to achieve my GSoC goals.
In this blog, I am going to describe and summarize my GSoC work.
$ man HsYAML
HsYAML is a YAML 1.2 processor.
Features of HsYAML include:
- Pure Haskell implementation and emphasis on strict compliance with the YAML 1.2 specification.
- Direct decoding to native Haskell types via (aeson-inspired) typeclass-based API.
- Allows round-tripping while preserving ordering, anchors, and comments at Event-level
- Support for constructing custom YAML node graph representation (including support for cyclic YAML data structures).
- Support for emitting YAML in the desired format
- Event-based API resembling LibYAML’s Event-based API.
Performance:
- HsYAML is now the best YAML processor among all the YAML processors from yaml-editor.
- HsYAML has more than 99% conformance on YAML-Test-Suite
(For more information visit YAML-Test-Matrix which combines all tests from YAML-Test-Suite and performance of all processors from yaml-editor).
$ git log
I started my GSoC journey a few days after the results were announced.
1) First Fix
For the next few days, I spent long hours staring at the YAML grammar productions and the HsYAML tokenizer to fix the failing tests of YAML-Test-Suite. Finally, after many futile attempts, I was able to fix some of them. (PR #10, #14)
done -- passed: 316 / failed: 2
Now HsYAML has more than 99% conformance on YAML-Test-Suite and it is the best YAML processor among all the YAML processors from yaml-editor.
2) Better error messages
The next task that I undertook was to output better error messages at each layer of the YAML loading pipeline. Better error messages are crucial because it makes the debugging process as well as life easy 😉.
Now all the error messages include the position of the error in the YAML document. (PR #19)
3) Dumping pipeline
This was one of the significant and challenging goals, I planned on completing as a part of my GSoC project.
Previously, HsYAML only had the support for loading YAML documents into Haskell data structures.
Now, we can dump any Haskell data structure into YAML documents that can be made a instance of ToYAML class. (PR #20)
4) Preserving format during a round-trip
This was another critical and challenging goal that I planned to complete as a part of my GSoC project.
Now, I can proudly say that HsYAML is one of the few Format preserving YAML processors i.e it is able to preserve many components of YAML documents like comments, ordering, anchors etc while round-tripping at event level. (PR #18, #24).
Note: The non-content indentation and spacing is not preserved.
Preserving comments was the second toughest part of my GSoC journey.
(Keep reading to know more about the hardest task 😊)
5) Custom and Schema Encoders
A YAML schema is a combination of a set of tags and a mechanism for resolving non-specific tags.
While many users prefer to stick to the pre-defined YAML 1.2 schemas (Core schema, JSON schema, Failsafe schema), for some use-cases (e.g., when using YAML as an EDSL, such as hpack) it might be desirable to emit the YAML document with a custom application-specific YAML schema encoding. (PR #21)
Also, some tend to prefer dumping YAML files in a particular format
For example:
Some prefer Double quoted Scalars like
"This is a Double quoted Scalars"
over Plain scalars like
This is a Plain Scalar
So, some API is provided for creating custom Encoder. (PR #26)
6) Testing, Documentation, and Tutorials
After completing a major part of my GSoC goals, I wrote a bunch of QuickCheck tests. (PR #21)
Fun/Embarrassing Fact: After testing, I found that I forgot to write code to handle very basic test-cases like empty mapping {}
and sequence []
.
After the testing phase was completed, I wrote a lot of documentation and tutorials. (PR #25)
(This was the toughest part 😅)
7) Dumping Pipeline in HsYAML-aeson
HsYAML-aeson is a JSON to YAML Adapter, i.e. a library that allows us to process YAML documents in the more limited JSON data-model.
Till now, HsYAML-aeson only supported decoding YAML documents into JSON value. So, I added some API for completing its dumping pipeline.
So, now it supports dumping JSON values into YAML documents while also providing the convenience to reuse Aeson’s ToJSON instances for encoding native Haskell data types into JSON values. (PR #1)
8) Working on other repositories
-
Dhall-json and Pandoc-citeproc
To test HsYAML further, I replaced yaml library API with HsYAML API in repositries like dhall-json and pandoc-citeproc.
The contributors/maintainers of dhall-json were very satisfied as they could now easily use ETA and GHCJS, which was previously not possible because of the yaml library as it used a C-based parser.
(Dhall-json PR #1248, Pandoc-citeproc PR #412) -
Pandoc
I made some modifications in pandoc repository, so that it is able to use HsYAML-0.2’s API.
(Pandoc PR #5704)
$ git status
All the goals mentioned in my project proposal were achieved, and I will continue contributing to the project more in the future.
HsYAML-0.2 will be released very soon.
Stay tuned!
Edit:
Check out the new v0.2 release of the HsYAML and HsYAML-aeson libraries.