What’s the bug about?
This week in class, my partners @irrationalRock, @Woody88 and I worked on a bug in the Brave browser. The bug was about the way the address bar parsed urls. Our goal was to first write the tests for for the bug, then create fixes that address the issue.
We looked at the following scenarios:
General strings
• “dog”
• ” dog ”
• “dog cat”
• ” dog cat ”
URLs with query string
• “https://www.google.ca/search?q=dog”
• ” https://www.google.ca/search?q=dog ”
• “https://www.google.ca/search?q=dog cat” (failed to properly set search query string)
• ” https://www.google.ca/search?q=dog cat ” (Failed to properly set search query string)
File paths
• “/path/to/file/dog cat.txt” (failed to render)
• ” /path/to/file/dog cat.txt ” (failed to render)
• ” C:\Path\to\file with space.txt” (failed to render)
• ” C:\Path\to\file with space.txt ” (failed to render)
• “file:////path/to/file/dog cat.txt”
• ” file:////path/to/file/dog cat.txt ”
We’ve come to the conclusion that whenever there is a space in a URL’s query string or a file path (Windows or Unix), the browser would take the entire string and perform a google search.
After examining the urlutil.js, we began writing a few tests inside test/unit/lib/urlutilTest.js
. The test does the following:
• Ensure prependScheme() is able to prepend the file scheme in front of a Unix file path
• Ensure prependScheme() is able to prepend the file scheme in front of a Windows file path
• Ensure URL encoding works for space characters in absolute paths
• Ensure URL encoding works for space characters in URLs with query string
We made sure the test would fail after writing the test first then went to hunt for the bug.
Looking for the bug
After cloning the repository and installing the node packages via npm install
.
We started the debugging process by executing npm run watch
followed by enabling the debugger in VS Code. Looking inside js/lib/urlutil.js
, we examined the way isNotURL()
parsed a string to determine if it is a URL or not.
This is important as the software’s workflow in the following order:
1. Determine if it has a valid scheme (i.e. Http://, Https://, file://)
2. Determine if it can potentially be URL
3. Trim the input.
4. Prepend the scheme if it is not already provided.
5. Check if the URL can be parsed and send it to the view.
@irrationalRock noticed the issue lies in the regular expressions used in isNotURL()
.
const case2Reg = /(^\?)|(\?.+\s)|(^\.)|(^[^.+]*[^/]*\.$)/
The intent of the second regular expression was to filter out inputs that match the following cases:
- starts with “?” or “.”
- contains “? “
- ends with “.” (and was not preceded by a domain or /)
However, in doing so, the case (\?.+\s)
ended up filtering URIs with legitimate query strings. The “.+” of the regular expression represents one or more non-newline characters; The original intent however, was to filter out the case where the input contains “? “.
The solution was to change this regular expression to the following:
const case2Reg = /(^\?)|(\?\s+)|(^\.)|(^[^.+]*[^/]*\.$)/
By modifying that particular part of the regular expression to (\?\s+)
we corrected the problem by specifying the filter to only engage if it detects a “?” character followed by one or more spaces.
This resolved the issue with a space in the query string. While we originally explicitly converted the space character into %20 to encode the URL, Professor Humphrey has noted that the URL encoding may occur implicitly when it gets sent back out to the browser. We did some extra investigation, and indeed, once the input string reaches getUrlFromInput()
, return new window.URL(input).href
on line 160 of urlutil.js will automatically URL encode the space character.
This resolved our very first bug.
Resolving Unix absolute file path with space in file name issue
The second problem was Brave interpreting UNIX absolute file path with a space in the file name as a string for google search. While tracing the way Brave parses an input string of such case, I’ve noticed that the regular expression in isNotUrl()
has once again over aggressively classified it as a non-url before the file scheme can be appended to the input string.
The problem lies in line 134 of urlutil.js
if (case2Reg.test(str) || !case3Reg.test(str) || (scheme === undefined && /\s/g.test(str))) { return true }
In the case of an absolute file path with a space in the file name, scheme === undefined && /\s/g.test(str)
will classify it as not a url.
I first consulted with Google Chrome and Firefox to see how these two popular browsers handled UNIX absolute file paths with space in the file names and what occurs when bogus paths were provided. This was done by directly testing inputs in these browsers as well as reading the opensource code for both projects.
It turns out that both browsers assume that if a string begins with a forward slash, it is a UNIX absolute file path regardless the path actually exists or not.
My solution then was to check if the input string begins with a “/” character; The regular expression: /^\/
resolved the issue.
Resolving Windows file path with name in file path issue
The last issue to tackle was same issue as the UNIX absolute file path issue, except this time on the Windows platform. The difficulty lies in the fact that Windows uses drive letters and backward slashes.
Due to the similarity with this issue and the UNIX absolute file path with space in file name issue, my proposed solution was to add on top of what I’ve already written. I refactored the code I’ve written previously and instead created a new regular expression case.
const case5Reg = /(?:^\/)|(?:^[a-zA-Z]:\\)/
The “?:” symbol along with ( ) to contain the regular expression represents a grouped regular expression without tracking. In addition the “|” symbol represents an OR statement followed by regular expression that will detect any windows drive letter.
This allows a Windows file path to be treated as a URL. However, doing this alone is not enough. I went to research on how a windows file path is represented using the file scheme.
Once I’ve understood how a URI should look for a file scheme with windows path, I’ve added a windows file scheme constant that detects windows drive letters in a case insensitive manner. Inside prependScheme()
, I’ve added additional logic to check for Windows file path.
Once a Windows file path is detected, it will use regular expression to replace backward slashes with a single forward slash.
if (windowsFileScheme.test(input)) { input = input.replace(/\\/g, '/') input = `${fileScheme}/${input}` }
After rerunning all the tests using npm run test -- --grep="urlutil"
, I was able to verify that all test cases (including the test cases I’ve written) passed. Just to be safe, I launched Brave to test each fix and indeed all of the issues mentioned have been fixed.
Conclusion
The test driven development style of programming is new for me and took a bit of time getting used to. It is difficult to visualize what a fix would look like before I have the code already written. However, it pushed me to read the underlying code and explore how Brave operates, the functions traversed while it parses the URI string that the user enters into the address bar, which gave me an idea for how these tests should look like.
Furthermore, I’ve learned to read code that other developers have written for other opensource projects. As a result, I saw new ways of writing code that I have never seen before. This gave me the opportunity to understand how to write better and concise code.