Cybersecurity – spotting the near miss
Captain Ruchin Dayal AFNI
Since the introduction of IMO Resolution 428 (Maritime Cyber Risk Management in Safety Management Systems), most classification societies have issued guidelines to ship staff, managers and equipment manufacturers detailing best practice for safeguarding against software corruption. Similar guidance is also available from BIMCO. Near miss reporting is an essential part of this exercise. Seafarers have been reporting near-misses since the advent of the ISM code, for more than two decades. As such, it is tempting to ask ‘What’s the big deal?’ when it comes to doing the same thing for cyber security incidents. However, identifying or recognising a near-miss in the context of cyber security requires a certain amount of training. In particular, it is hard for seafarers to recognise or be alert to cyber threats which are not yet widely discussed or documented. To date, the maritime industry has been very subjective and diverse about cyber related near-miss reporting. This paper attempts to create a focused line of thought in identifying and reporting cyber security nearmisses. In a small way, I hope it will help ship managers and sailing staff in getting familiarised with the requirements of a Cyber Security Management System.
What is operational technology?
In order to evaluate a near miss, it is important to recognise the impact of the incident which has been avoided. Understanding the concepts of operational technology (OT), and information technology (IT) (see Seaways, October 2020) and how they differ is essential for estimating the potential impact of any cyber incident. In brief, any online system or component where the designed output is data, is IT. Any system where the output is physical change or action can be regarded as Operational Technology (OT). This includes all critical elements in the functioning of primary shipboard processes, such as navigation, main engine controls, etc. Any compromise in their designed capability has the potential to impact the productivity and in many cases the safety of the vessel, crew and the environment. While hardware inspection regimes and preventive and predictive maintenance schedules have been well developed over time, software regimes – preventive and predictive upgrades, compatibility, cache clean ups, efficiency and other related maintenance tasks – are still at a nascent stage. Creating a maintenance regime is no mean task as any onboard system is likely to comprise machinery or equipment from several OEMs (original equipment manufacturers), connected by common protocols, albeit each with unique outputs and varying vulnerabilities. While each OEM is concerned with the performance of its own equipment, vessel operations are dependent on the information generated by integrating inputs from all of this equipment. For example, position fixing requires GNSS, ECDIS, radar, gyro, etc. Similarly, the onboard power management system may have several independent components. Recognising and recording near-misses not only contributes immensely to experiential learning, but also helps develop schedules for preventive/predictive maintenance and provides valuable technical data to the makers for improving and modifying equipment and controls.
OT Cyber Related Corruption
In order to recognise or identify a cyber related OT near-miss, it is necessary to understand how the OT system is affected in the first place. A ‘cyber related event’ generally means corruption of a software element of the OT system, either of the component in question, or of the software controlling shipboard activity as a whole. A cyber related control software corruption can take four main forms:
1. Direct corruption
Online systems connected to the internet via VSat/Inmarsat/others, often to link to makers, managers or service contractors, are vulnerable to direct, targeted, corruption of their software. Common examples include propulsion control systems, power management systems which send data for analysis to dedicated servers, or the ECDIS which is dependent on internet connectivity for ENC corrections. In merchant cargo ships, present day context suggests that any corruption
2. Indirect corruption
This is a risk for all onboard systems with control software, even if they do not connect directly to the internet. For example, service engineers may connect their own equipment to the system to carry out diagnostics, etc. This equipment – which may be as simple as a laptop – may itself be corrupted by malware, which then gets introduced into shipboard systems in the process.
3. Natural Corruption
A lot of control software is based on a standard operating platform. With the passage of time, the system resources (the hardware running the software) tend to get sluggish; either due to a buildup of temporary files or a growing database cache, or simply because the operating systems and software versions become outdated, and are eventually unsupported. A frequent observation in these systems is that they tend to ‘hang’. Seafarers seldom follow best practices for regular software maintenance, as they are hesitant (often rightly) to touch control software for critical systems. These legacy systems are especially vulnerable when attended to by external vendors whose laptops are loaded with contemporary software (see above).
4. Un-natural Corruption
This refers to ‘misuse’ of the system with the controlling software. Often, the human machine interface (HMI) may be a regular computer or even a laptop. Poorly trained and undisciplined personnel use these systems with their personal storage devices and tend to infect them with existing malware and other kinds of trash. There have been reports of personnel using critical systems to watch movies during their watch-keeping. This should be considered as a severe dereliction of duty and the personnel in question must be brought to task in a very stringent manner. Even something as simple as plugging a phone in to charge can introduce malware.
Spotting the cyber element
Any corruption of data or introduction of malware may go undetected for months, coming to light only when an abnormality is noticed in the operation of the hardware. Makers or service engineers are brought into the picture only after shipboard engineers and technical superintendents have exhausted their own efforts, first changing spares, cleaning, tuning efforts, etc. The resulting investigation is often focused on the mechanical defects, failures, or related causes, and the software corruption at the base of it all goes unregarded. The actual causation is seldom arrived at, even after the makers have reset the machinery and things are working normally. Even when software corruption is discovered, it is only on a very few occasions that a subsequent investigation reveals when and where it took place. Resetting of the entire software is the more prevalent practice, especially when time is a constraint. Unfortunately, this may be only a temporary fix, only for the same problems to recur after a period of time. Every malfunction of critical equipment, major and minor, must be approached from a cyber security point of view in addition to the mechanical investigation. This has the potential to save a great deal of time and effort. Makers or shore service contractors must be brought into the picture early and the probability of a cyber issue considered at the very outset. In a nutshell, identifying a cyber element in a malfunction is not easy, and concluding that there has been a near-miss situation is extremely difficult. The following section depicts a few scenarios which may be considered to be a near-miss.
OT Near Miss Scenarios
A ‘near miss’ in the maritime world is commonly defined as one of two things:
I. A potential risk is identified, and due to corrective action taken in time, an incident is avoided. Usually by due diligence.
II. A potential risk is not identified, and hence no corrective action is taken, however, an incident is still avoided. Usually by sheer luck. In the incidents which are listed here, I have tried to build scenarios which could be considered as a ‘near-miss’ in both these senses. These are not exhaustive, but may help guide the thought processes in the right direction, and may help develop some ideas for what a cyber security near miss report might look like in practice.
The vessel is due to enter the Malacca Strait in six hours. The 2nd Mate is on watch, and notices that the position being displayed on the GPS is marginally different from that being repeated on the ECDIS and radar. Action: Second Mate compares the position on the secondary GNSS and even the third GNSS on the bridge and finds that all of them are displaying a very slightly different position. He immediately alerts the Master. Alternate position fixing techniques are deployed. Lookout is doubled. The Cyber Response plan is applied immediately – the Company Security Officer (CySO) is called, makers and service contractors are contacted and their advice is followed. The possibility of spoofing is explored, and systems are reset. Outcome: No incident takes place – no slowdown, no delays. Thanks to due diligence on the part of the second officer, this is a near miss, but no worse.
The vessel is due to enter the Singapore Strait in six hours. The 2nd Mate is on watch. The position being displayed on the GPS is marginally different from that being repeated on the ECDIS and radar. However, the OOW doesn’t notice. Action: The vessel enters the Singapore Strait. Coastal navigation procedures are applied – the vessel is under the command of the Master/ Pilot; radar plots are used along with visual bearings. The vessel anchors safely at Singapore anchorage for supplies, services and bunkers. The radar technician boards for a routine maintenance call, during which he discovers the discrepancy. The same routine is then followed. The cyber response plan is applied immediately, the CySO called, the makers/ service contractors contacted, and their advice is followed. Outcome: No incident takes place – no slowdown, no delays. Robust procedures and good luck have resulted in a near miss.
The vessel is in the English Channel, heading west. Coastal navigation procedures are in place. The ECDIS display, as always on this vessel, is being utilised as the chart. One of the two ECDISs has become sluggish; the positions are taking a second more to update. However, nobody notices this on the bridge. The vessel enters the Atlantic and commences an open sea voyage. Action: The sluggish behaviour of the ECD2 gets worse and is noticed by the Chief Mate the next day. The cyber response plan is applied immediately. ECD1 is first isolated from ECD2. The CySO is called, the makers/service contractors contacted, and their advice is followed. During discussions afterwards, the third officer confirms that he did notice something during the English Channel transit, but didn’t realise it was important enough to report. Outcome: No incident takes place – no slowdown, no delays. Robust procedures and good luck have resulted in a near miss.
The vessel is in the Pacific in open sea conditions, heading to Taiwan to load sugar. During a routine evening round in the Unattended Machinery Space (UMS) engine room, the 3rd Engineer notices a slight fluctuation in voltages – Auxiliary Engine 2 is in use. He takes Auxiliary Engine 1 onload and changes over. Job done; all is ok. The vessel continues her voyage. The 3rd Engineer doesn’t consider the event serious enough to report. 48 hours before entering restricted waters, the engines are stopped and a complete trial of all critical functions is carried out, as per company policy. It is observed that Auxiliary Engine 2 is working strangely and unable to sync with the other engines. Action: It is decided not to use Auxiliary Engine 2 during manoeuvring. Makers/service contractors are contacted and asked to attend in Taiwan. The vessel picks up the pilot and manoeuvres safely to berth. Service engineers attend and their investigations reveal corruption of the main controlling software. They advise that the vessel was lucky that the corruption did not affect the other three auxiliary engines. It was never discovered how and when the corruption took place. There is a strong possibility that it has been present but dormant since delivery of the vessel five years ago. The system is formatted, loaded with clean software, calibrated and tested. The vessel completes cargo operations and proceeds for discharge to Australia without incident. Outcome: No incident takes place and there are no delays. Again, good adherence to robust procedures and a bit of good luck has meant a near miss, even though an initial alert point was missed. Scenario 5 A vessel sails from Houston bound for Singapore via the Suez Canal. A planned stop at Gibraltar for provisions has to be cancelled due to Covid restrictions. As a result, the vessel has to take on fresh rations at Suez during the transit. Once the rations have been lifted and inventoried, the chandler requests the use of a computer to print the final receipt. The request is denied, in line with company cyber security policies. However, the chandler has the pilot prevail upon the Master to allow use of the office computer. Action: The Master requests the 2nd Mate to isolate a computer from the network and then allows its use by the ship chandler. The machine is not reconnected until an updated virus scan can be run at Singapore. The computer is found to be infected by 5 different pieces of malware/adware – all from the chandlers. The computer is cleaned and reconnected to the network. Outcome: No incident takes place – no delays. Due diligence led to a near miss.
A vessel is anchored in Japan, waiting to berth in Nagoya. The vessel has a modern UMS engine room. The atmospheric temperature is –5C. The accommodation air conditioning is working well and the officers and staff are taking a well-deserved rest before they start cargo work in the morning. The second mate gets an alarm on the bridge at 0200 – boiler failure. The duty engineer hears the same alarm in his cabin, acknowledges it and goes to the ECR. He observes that both the main and the auxiliary boilers have failed. He tried auto ignition with little success. He makes a round of the boilers and everything seems fine – but they are not firing. Action: The duty engineer calls the second engineer and one of the engine crew, and they go through the routine again. It is now 0330. The temperature inside the accommodation has begun to drop. Soon, the Chief Engineer is called, along with the entire department. With the pilot booked for 0600, main engines are tested and kept on standby, while the boilers are being investigated part by part. The engineers narrow the fault down to the ignition chamber but are unable to rectify it. The Master orders technician attendance as a priority. The vessel berths as planned – but loading is delayed to the afternoon because of the weather. The technician boards at noon, and checks the operating mechanical sections of the boiler. They connect a HMI [laptop], run a diagnostic and reset the firmware. The boilers are working again. Outcome: The ship staff has indeed been inconvenienced – but there are no operational delays on the vessel’s account. An incident, rather than a near miss.
Spotting the cyber factor
With current development in technology, the brain is tuned to recognise any interruptions to service as a ‘mechanical fault’. If our washing machine at home is not cleaning the clothes properly, we suspect a weak motor, or a worn-out drum, or insufficient water pressure. How many of us ever consider that the operating software may have been corrupted? Similarly, consider this. The 3rd Officer plots a radar fix, and finds a difference of 1 NM from the position being repeated from the GNSS. What do we first deduce? Poor bearing and distance measurement, or wrong interpretation of the coast? How many of us immediately consider the possibility of GNSS spoofing? Unfortunately, it is the same in the engine room – if the main engine is having problems with emissions or maintaining RPM, the piston rings are checked, the governor inspected, and so on – nobody ever considers a firmware or software corruption. This thought process must change. If are to continue enjoying the benefits of technology and automation, we will have to consider their vulnerabilities and protect ourselves adequately. We might enjoy using the kindle or the iPad for reading books; it is convenient and even works as our library, however, if we do not protect our e-reader from the hazards of open internet, then it would be better to go back to reading hard copy books. Unfortunately, going back in time is not an option when at sea. The cyber security threat is real, it is here to stay and it is set to get more challenging in the future. The reporting of near-miss or an incident, investigation of the same, reaching the root cause, and providing experiential learning is a corner stone in developing a healthy cyber hygiene culture. This culture is the single most important goal of a cyber security management plan. Seafarers must be encouraged to express their own ideas for recognising and reporting of near misses. This is work in progress and any contribution is valuable. I invite readers of this article to consider the subject and come up with constructive suggestions.