Live Graph Lab: Towards Open, Dynamic and Real Transaction Graphs with NFT

Temporal graphs provide a more accurate representation of real-world systems like social networks, citation networks and the Web. By studying temporal graphs, we can gain insights into the temporal dynamics and understand how these systems evolve and function.

In the current literature, studies are usually conducted on a set of outdated and incomplete graphs. The majority of the aforementioned graphs are either not publicly available or their graph structures are incomplete since they cannot record all the interactions in graph. Moreover, even through all the interactions are recorded, they may not be able to be shared in a public and timely evolving manner, such as social networks in companies like Meta and Tencent. Thus, for meaningful temporal graph analysis and benchmarks, we need some graph datasets that are openly available, dynamically evolving and encompass real interactions.

In this paper, we introduce the concept of Live Graph Lab for temporal graphs, which enables open, dynamic and real transaction graphs from blockchains. To enrich the current datasets and overcome their limitations, we instantiate a live graph with NFT transaction network by synchronizing a full Ethereum node, thus it continuously keeps up with the latest Ethereum block and includes all the transaction data.

In addition, we present a temporal graph extracted from a specific time period spanning from 2017 to 2022, which comprises over 4.5 million nodes and 124 million edges. Then, comprehensive analyses are performed, and the results demonstrate that our presented live graph exhibits a variety of characteristics, offering exciting opportunities for the graph community. We now offer access to the datasets through both the following Google Drive and Zenodo links.

Codes » Metadata » Link Prediction Data » Node Classification Data » CSM Data » Zenodo Backup»

Overview

Metadata

We extract all the blocks before Aug 1st, 2022 (i.e., from block #0 to block #15,255,104). Then, we parse all the transaction data and log data via toolkit Ethereum ETL. Through this way, we obtain a network with more than 4.5 million nodes and 124 million edges. The data are stored in a tabular format with the following headers: collection address, block number, from address, to address, token id, transaction hash, value and timestamp. The statistical information is as follows:

DescriptionsStatistics
Start date (mm-dd-yyyy, UTC)07-12-2017 13:49
End date (mm-dd-yyyy, UTC)08-01-2022 06:50
Number of NFT collections97,667
Number of NFT tokens77,991,885
Number of account addresses4,531,020
Number of transactions124,660,813

The data format is as follows:

                     collection address,block number,from address,to address,token id,transaction hash,value,timestamp
                     "0x55b9...",4012328,"0x51e1...","0x6c9a...","0x9000...","0xb31f...",0.0,1499867867
                     "0xf4b1...",4254954,"0x7307...","0x3a3f...","0x8000...","0x3409...",0.0,1504953873
                     ...
                

Visual Examples for Trade Patterns

We provide several visual examples for trade patterns as outlined below. As we can see from the figure and table, some transactions exhibit obvious patterns and structures. After several subsequent transactions, the tokens eventually return to their original holders.

from addressto addressNFTtokenidvaluetimestamp
Pattern 1
0x7440e1407f95f33206fb72464a63cd54b2ee62820xd39e456c22eabf3ab0e58bdeea269927c65f6263Azuki12158.981652208247
0xd39e456c22eabf3ab0e58bdeea269927c65f62630xeaaeac965449d2426f6f793770b4f3560eeb7c0fAzuki12159.691652208655
0xeaaeac965449d2426f6f793770b4f3560eeb7c0f0x7440e1407f95f33206fb72464a63cd54b2ee6282Azuki121511.991652210005
..................
Pattern 2
0x7ED07e5a6f3F30AeE3E1d8823b8b331C4bA6D0930xcc7cBFb9D98e364A78B9EFDa0Dd83122880b6e42OGCR91190.141634853524
0xcc7cBFb9D98e364A78B9EFDa0Dd83122880b6e420x8Bbc266Fe8A93186cea865d062A120805fBb25d2OGCR91190.1451634855712
0x8Bbc266Fe8A93186cea865d062A120805fBb25d20x54e3AEBCDc3d346fF1e24e9c01fc7184664Ce239OGCR91190.151634857462
0x54e3AEBCDc3d346fF1e24e9c01fc7184664Ce2390x7ED07e5a6f3F30AeE3E1d8823b8b331C4bA6D093OGCR91190.1551634859791
..................
Pattern 3
0xBB22904A513b7E3E39ee3aCd962C430989Add8f10x24aF5E16C9Fa5a4969499baAa290E679bc120161Phunk32041641900784
0x24aF5E16C9Fa5a4969499baAa290E679bc1201610xBB22904A513b7E3E39ee3aCd962C430989Add8f1Phunk32061641900973
0xBB22904A513b7E3E39ee3aCd962C430989Add8f10x5d84A71B2648bD0B4f55d0D20Cf99dbd65D64458Phunk3205.81641901272
0x5d84A71B2648bD0B4f55d0D20Cf99dbd65D644580xBB22904A513b7E3E39ee3aCd962C430989Add8f1Phunk32061641901732
..................
Pattern 4
0xcAf39ae584cD9d64b594Db907036D932D15a2a820x70316254c05eec4870B58Ff0D7267ca03D1b672aTAs67670.8251658174367
0x70316254c05eec4870B58Ff0D7267ca03D1b672a0xA95558E2C3BA5D4b13568716E1c043C18631Df63TAs67670.8461658213566
0xA95558E2C3BA5D4b13568716E1c043C18631Df630xcAf39ae584cD9d64b594Db907036D932D15a2a82TAs67670.8661658213924
0xcAf39ae584cD9d64b594Db907036D932D15a2a820xE616BFB003d67ef3a364f79D10e64F9349d2aE16TAs67670.8881658264899
0xE616BFB003d67ef3a364f79D10e64F9349d2aE160xcAf39ae584cD9d64b594Db907036D932D15a2a82TAs67670.8921658268808
..................
Pattern 5
0xa5c87DC380C263bBA1a3c49d9a7Bf582A56E94db0x3A6ee71b12AE5351Feba34dfF1e2fD27cD32a95dN44401.08721633897537
0x3A6ee71b12AE5351Feba34dfF1e2fD27cD32a95d0xa5c87DC380C263bBA1a3c49d9a7Bf582A56E94dbN444001633898802
0xa5c87DC380C263bBA1a3c49d9a7Bf582A56E94db0x2BC287DEEd55C675f3b7CD6B00c6d13aCBeE70A5N44401.14441633961991
0x2BC287DEEd55C675f3b7CD6B00c6d13aCBeE70A50xa5c87DC380C263bBA1a3c49d9a7Bf582A56E94dbN444001634002446
0xa5c87DC380C263bBA1a3c49d9a7Bf582A56E94db0xbD422Ca7d060b297D3b2de1ae27296A27765c824N44401.16471634073005
0xbD422Ca7d060b297D3b2de1ae27296A27765c8240xa5c87DC380C263bBA1a3c49d9a7Bf582A56E94dbN444001634117902
..................

Live Graph Query Component

We provide a component for users to query graphs by specifying the start and end timestamps, so that the researchers have the capability to construct their own customized datasets. At this stage, we are offering a prototype version and our ongoing efforts are focused on refining and enhancing the user experience. Our goal is to present this tool in a manner that is not only user-friendly but also stable and reliable. We are confident that the final version will offer a seamless and robust experience for users. Sometimes you might encounter errors due to the stability of our server or ongoing updates. We encourage you to make multiple attempts if this happens.

Query Entrance »

Link Prediction and Node Classification

For link prediction task, we remove all the transactions associated with the Null address, which results in 3.13 million nodes and 23.13 million edges in the directed graph. The data file includes four columns: from id, to id, edge weight and timestamp.

For node classification task, we first filter out nodes that only have one transaction. Then, each node' maximum transaction interval is calculated. If the maximum interval is within one day, we call it daily trader. Likewise, if the maximum interval is within one week and larger than one day, we call it weekly trader, and so forth. This process results in a large-scale directed graph with about 1.80 million nodes and 21.83 million edges. The data format is as follows:

                        fromID,toID,Weight,Timestamp
                        0,1,4,1499867388
                        0,2,5,1508115274
                        ...
                    
                        nodeID,Label
                        0,0
                        1,3
                        ...
                    

Continuous Subgraph Matching (CSM)

Similar to link prediction task, we also remove all the transactions associated with the Null address. We use NFT transactions from year 2017 to the end of 2021 as the initial graph, and then the transactions in the year of 2022 are regarded as the insertion streams. Since the original nodes and edges are unlabeled, we randomly assign one of 30 labels to each node, and we do not assign labels for edges and we use the edges' directions as labels. The graph update stream are structured as follows:

                        v 3 1
                        e 2 3 2
                        -v 2 1
                        -e 0 1 0
                    
Graph update stream is a collection of insertions and deletions of a vertex or an edge.
  • A vertex insertion is represented by v vertex-id vertex-label.
  • A vertex deletion is represented by -v vertex-id vertex-label.
  • An edge insertion is represented by e vertex-id-1 vertex-id-2 edge-label.
  • An edge deletion is represented by -e vertex-id-1 vertex-id-2 edge-label.
Each line in the query graph file represent a vertex or an edge.
  • A vertex is represented by v vertex-id vertex-label.
  • An edge is represented by e vertex-id-1 vertex-id-2 edge-label.
The two endpoints of an edge must appear before the edge. For example,
                        v 0 0
                        v 1 0
                        e 0 1 0
                        v 2 1
                        e 0 2 1
                        e 2 1 2
                    

Contact

Questions about Live Graph Lab, or want to get in touch? Contact us at livegraphlab.

Design and CSS courtesy: Rowan Zellers.

Template borrowed from TwiBot-22.