Frequently Asked Questions
What is the Open Source Insights project?
Open Source Insights is a service developed and hosted by Google to help developers better understand the structure, security, and construction of open source software packages. The service examines each package, constructs a full, detailed graph of its dependencies and their properties, and makes the results available to anyone who could benefit from them.
The dependency graph is decorated with versioning and licensing information, known vulnerabilities, and other important signals of code health and safety.
The data is provided not just for the project’s own code, but for all packages in the fully constructed graph of all dependencies of the project, right down to the individual versions of each dependency. This means, for instance, if the software depends on a version of a dependent package with a vulnerability, the vulnerability will be made visible on the Insights web view of the project at the top level.
Insights shows you the licenses used throughout the dependency graph, which you can use to find conflicts or other license issues.
It also provides interactive tools to visualize the dependency graph, compare versions, filter the dependencies, and more.
Finally, Insights also shows version histories and other relevant information.
Why have you built this?
We want everyone to have a productive, safe, dependable and trustworthy open source software environment. Understanding a project’s dependencies is a critical part of meeting that goal. An unwisely chosen dependency can introduce licensing or security problems, while simply having too many dependencies can make maintenance challenging. Developers everywhere are updating their code every day, and that could affect your own software unexpectedly. It can be hard to keep up.
The Insights project aims to help by providing developers and project owners insights into the health of their software by integrating information about all their dependencies and providing a way to see how they fit together. Existing tools and packaging systems do some of this, of course, but not enough and not uniformly.
We hope to access to high-quality information and analysis about open source software projects will make it easier to build and maintain high-quality software.
How does this differ from what is provided by my usual tooling such as npm or by other services?
Insights is not an attempt to replace the standard tool set, but rather to augment it with a fresh, integrated view of the whole ecosystem for each packaging model.
A key difference is that the Insights data is derived from first principles, looking at the software and its packaging definition. The result may be substantially different or more complete than just the declared dependencies of, for instance, a packaging “lock” file. Moreover, the data presented by Insights is re-evaluated regularly, to keep it up to date, which is important in the fast-moving world of open source development.
Some of the information provided by the Insights project may be available from your regular “native” tools, although just how much depends greatly on the packaging system being used. However, none of the systems provide all the information Insights does. Moreover, although it may not matter unless you work in multiple languages, the Insights service attempts to provide, as much as possible, the same analysis and user experience for every packaging system.
The site includes not just details about the dependencies of a package, but also some information about its dependents, those packages that depend on it, transitively. This informs to package authors but also helps locate packages that may be of special importance to the ecosystem, those depended on either directly or indirectly by many packages.
Finally, Insights integrates the metadata, showing how licensing problems, security vulnerabilities, and other relevant issues affect the whole system, not just the project itself.
How do I access the information?
The Insights portal provides a web page for each project. It can be used to examine the data, study it, and perhaps help understand how to address problems that it uncovers.
The data is also available via our API and as a BigQuery dataset.
How is Insights implemented?
The project is hosted on Google Cloud Platform. It scans the public open source packaging systems and repositories, and from the data received constructs the dependency graph and annotates it with other metadata such as licensing and vulnerability information.
Most of the back end is written in Go and reimplements the dependency analysis algorithm (version resolution) of each packaging system. The code has been tested for agreement with the native tooling, but is faster because it is decoupled from package installation. It constructs the graph without actually building and installing the software.
For a more detailed discussion of how the project works and the terminology it uses, see the Glossary.
What packages does Insights cover?
The project scans all the available packages it can discover, either by reading the package home site for a system like npm, or by scanning GitHub and other repository hosting sites.
The project currently supports Cargo (Rust), Go’s module system, Maven (Java), npm (Node.js), NuGet (.NET) and PyPI (Python). At the moment, Insights can only analyze systems, like these, with a known packaging model, since it needs the packaging information to construct the dependency graph. This means that, for now at least, there is no data for C or C++, which do not have a clear packaging model.
What systems will it cover in the future?
We are working to include a number of other packaging systems. There is a fair bit of work to do for each system we add, though, so we cannot predict just when new systems will be supported or what they will be.
How fresh is the information?
The Insights service’s view of a package is updated by two independent mechanisms. Several feeds maintain the information for packages being actively updated. Meanwhile a background scan visits every known package at a constant rate to catch any updates that might be missed.
As a result, the data for commonly used packages is usually fresh, up to date to within an hour or so. Quiescent or obsolete packages can be presented with staler data, however.
There is no mechanism for users to trigger an update.
How accurate is the information?
The Insights team has independent implementations of the resolve algorithms that compute the dependencies for a package. These have been tested against the “native” implementations, and given identical inputs the results agree closely: 99% or higher, often much higher. Differences can arise because of version skew, undocumented or obscure features of the packaging model, input from the build system not available to us, and other factors.
Be aware, too, that the dependency graph for a package is not always a unique item, as it can depend on whether test or other dependencies are included, what features are enabled, and so on. Since the graph calculation is transitive, even a small change in any detail of the dependency specification can affect the entire graph.
How are licenses determined?
For Cargo, Maven, npm, NuGet and PyPI, license information is read from the package metadata. For Go, license information is determined using the licensecheck package.
License information shown is not intended to be legal advice, and you should independently verify the license or terms of any software for your own needs.
We identify licenses as SPDX expressions. When there is no associated SPDX identifier, we indicate the license is non-standard. When we are unable to obtain license information, we indicate the license is unknown.
Is this open source?
The underlying platform is not open source, although some pieces, such as the licensecheck package, are available.
How can I get more information or ask for new features?
Feel free to contact us at email@example.com.
Who is that creature on the web page?
That’s our team mascot, Ol’ Cap’n Napkins, designed by Renee French and copyrighted by Google LLC.