I started my career with XSLT programming, XSL-FO. And Terraform is much similar to XSLT. It is like I have a hammer for every nails. People hate XSLT. The syntax is unlike a programming language, in 1.0 version there is limited functions (thanks to the extension packs it can at least do something.) And once you get the idea of template matching, with the matching criteria magically worked, there is only one input and one output so you would not have a large program. Everything is text, the input, the code, the output. Extremely friendly to unix tools, except that XML itself is so bad for unix tools. Any structured text is bad for command line tools. Even if one can use XPATH and tools like "jq", who can remember all those syntax?
I cannot express all the complicated emotions to XSLT in my limited language skills. Similarly, Terraform share all the properties that why people might hate it. And it is worse, because
- there is not a document model to work on. The code itself is data. Terraform has data structures like list and map, but unfortunately they are only useful in interpolation. The resource itself can be linked to a data structure by using a workaround (count=...) but it would be great to manipulate resource and data structure in the same way. Even for interpolation, there is no way to match and filter something. The only thing comparable is still a workaround, like print two identical length list into one.
For XSLT there is a often used workaround -- build a data structure in memory, which is used in the same way as the input document.
- Or we say resource is strong typed data, and the state is both input and output. The purpose of that DSL is to hide the operations to the state, everything is a declaration, and the manipulations are decided by providers?
Each provider is simply a set of methods that matches its own resource type. And in XSLT they are called templates. To import a set of templates, combine them with hand written templates, is easy. Try import some providers, extend its function without waiting for the upstream release? Like the most recent addition (lambda concurrency) is released in 1.6, which is already two weeks late than the re-invent conference. How can this compete with upstream tools?
The special data types can only be operated with providers, which is now a compiled blob, not even text. When a parameter's definition is not recorded in document, go back to Github to read the source code and good luck.
- when the state file and the actual state mismatch cannot be resolved by a provider, there is no way to set a policy. The most common scene is when a resource is renamed. Sometimes new resource name can be mapped to the actual resource, but _before_ the old resource name is deleted (then the actual resource will be gone with it). It is rarely the case that "tf plan" is successful but "tf apply" is can run without issue. Fully automation will never be possible if there is no way to set a policy to decide what to do with a collision.
It is not a simple process like XSLT to generate a output from input. There is more to it -- how to coordinate(?) the output state with the existing state. It is more like the database upgrade scripts. A table cannot rename, and there is no "resource name" to refer to a table. The database scripts must run strictly in order. And even with these, releasing database upgrade scripts can be a real headache. At one time I wanted to use some scripts too to forget and re-import resource under the same name, but I am not smart enough to implement that.
I really wish I can declare in Terraform how to rename a resource, or just do not use resource name at all. Now think about this scenario: 1. all scripts have to be tested in non-PROD environment before going to PROD; 2. old version does have a resource that must have a new name in new version; 3. the tests must be able to run multiple times since it is a test to the code. And it is expected to have full automation in either environments. 4. the restriction is that Terraform cannot rollback a state, just like database upgrade scripts cannot rollback themselves. 5. "tf plan" does not detect collision so eventually we have to manually run "tf apply" first then manually resolve any issues. Even if it can detect collision in "tf plan" and report things like the GUID of a lambda trigger can be very helpful -- can be scripted to implement our own policy..