Neo4j: Analyzing the supplier's list of Apple and Samsung

6 min readFeb 3, 2024

In this article, we will review Neo4j basics by getting data about Apple and Samsung supplier lists. We are analyzing the supplier list of Apple and Samsung to compare what common suppliers they have and where they are located.

You can follow the tutorial here in the ipynb notebook

We started with the Apple supplier list from here and for Samsung, we scraped data from here. Here we are comparing the supplier location for both of these electronics companies and where are they located. We have different fields like the address name, country name and supplier name. We had to manually clean the data for some integrity.

DATA COLLECTION PIPELINE

The Apple supplier list was a PDF, so to scrape the relevant fields, we used Pymudf and with below code

from unstructured.partition.auto import partition
elements = partition("Apple-Supplier-List.pdf",include_page_breaks=True)

companies = []
places = []
countries = []

apple_suppliers_dict = {}
num_places = 1
num_countries = 1

for idx,elem in enumerate(elements):
    elem_dict = elem.to_dict()
    if elem_dict['type']=='PageBreak':continue
    top_left = elem_dict['metadata']['coordinates']['points'][0]

    top_left_x = top_left[0]
    #Supplier name in first column
    if 35<top_left_x<37:
        company_text = elem_dict['text']
        if company_text=="SUPPLIER NAME":continue
        companies.append(company_text)    
        apple_suppliers_dict.update({company_text:{"places":[],"countries":[]}})
        num_places = 1
        num_countries = 1
    # Location name in second column
    elif 469<top_left_x<471:
        place_text = elem_dict['text']
        if place_text=='PRIMARY LOCATIONS WHERE MANUFACTURING FOR APPLE OCCURS': continue
        places.append(place_text)
        apple_suppliers_dict[company_text]['places'].append((place_text,num_places))
        num_places+=1
    # Country name in third column    
    elif 889<top_left_x<891:
        country_text = elem_dict['text']
        countries.append(country_text)   
        apple_suppliers_dict[company_text]['countries'].append((country_text,num_countries))
        num_countries+=1

There was some missing data on the country name, so I had to do a manual adjustment

apple_suppliers_dict['Murata Manufacturing Company Limited'] = {'places': [('Guangdong, Jiangsu', 1),
                                                            ('Fukui, Fukushima, Ishikawa, Miyagi, Miyazaki, Nagano, Okayama,Shiga, Shimane, Toyama', 2),
                                                            ('Perak', 3),
                                                            ('Singapore', 4),
                                                            ('Taiwan', 5),
                                                            ('Lamphun', 6),
                                                            ('Da Nang, Tien Giang', 7)],
                                                            'countries': [('China mainland', 1),
                                                            ('Japan', 2),
                                                            ('Malaysia', 3),
                                                            ('Singapore', 4),
                                                            ('Taiwan', 5),
                                                            ('Thailand', 6),
                                                            ('Vietnam', 7)]}

faulty_dict = {}
for comp, place in apple_suppliers_dict.items():
    if len(place['places'])!=len(place['countries']):
        print(comp)
        faulty_dict = apple_suppliers_dict[comp]
#No faulty dict found now

Preparing the final data for Apple

import pandas as pd
apple_df = pd.DataFrame(columns=['Supplier','Address','Countries'])
apple_comp = []
apple_places = []
apple_countries = []
for comp, place in apple_suppliers_dict.items():
    for pl,vals in place.items():
        if pl=="places":
            for val in vals:
                apple_comp.append(comp)
                apple_places.append(",".join(val[0].split(",")))
        elif pl=="countries":
            for val in vals:
                apple_countries.append(val[0])
apple_df['Supplier'] = apple_comp
apple_df['Address'] = apple_places
apple_df['Countries'] = apple_countries

For Samsung, we had a pandas data frame

To separate the country names and Addresses, we use the following helper functions

import re
def get_country(addr:str):
    addr = addr.split(",")[-1]
    addr = re.sub("\n","",addr)
    return addr.strip()
def remove_country(addr:str):
    addr = ",".join(addr.split(",")[:-1])
    addr = re.sub("\n"," ",addr)
    return addr.strip()
sam_df['Countries'] = sam_df['Address'].apply(get_country)
sam_df['Address'] = sam_df['Address'].apply(remove_country)

THE TWO CSV FILES USED FOR THIS HOMEWORK ARE BELOW (Google Drive links)

CREATING THE DATABASE

FROM ARROW TOOL

The arrow tool relation for writing the schema in Neo4j

THE ABOVE SCHEMA IS A ROUGH SKETCH AND IN THE DATABASE, WE HAVE JUST CHANGED THE PROPERTY NAME LIKE `Company Name` BECAME name

A sample data insertion to Neo4j would look like the following

CREATE (apple:Company {name:'Apple 
Inc.',headquarter:'Cupertino 
California',sector:'Technology'}),(samsung:Company{name:'
Samsung Electronics Inc.',headquarter:'Suwon-si South 
Korea',sector:’Technology’})

We have three primary nodes Apple Inc., Samsung Electronics Inc. and Suppliers company name, which has multiple nodes. The relationship between the supplier and company is named supplier and each supplier node has properties on address and country

THE DATA INSERTION CODE IS VERY LONG, FOR BREVITY IN THE REPORT, PLEASE REFER TO THESE GOOGLE DRIVE LINKS FOR THE CODE USED

Apple Only data: (Example: CREATE (:Company {name:’Apple Inc.’})<-[:supplier]-(:Supplier {name:’3M’,short_address:’Alabama, Indiana, Iowa, Minnesota, Ohio, South Carolina, Wisconsin’,country:’United States’,full_address:’’}))
Samsung only data: (Example:CREATE (:Company {name:’Samsung Electronics Inc.’})<-[:supplier]-(:Supplier {name:’Adeka Corporation’,short_address:’Jeollabuk-Do’,country:’Korea’,full_address:’Yuhang Rd, Shuyang Economic Develop Suqian City, Jiangsu Province’}))
Both Apple and Samsung Supplier Data: (Example: CREATE (:Company {name:’Apple Inc.’})<-[:supplier]-(:Supplier {name:’AAC Acoustic Technologies Holdings Incorporated’,short_address:’Jiangsu’,country:’China’,full_address:’213167 №3 Changcao Road, Hi-Tech Industrial Zone, Wujin, Changzhou City, Jiangsu Province’})-[:supplier]->(:Company{name:’Samsung Electronics Inc.’}))

You can see in the common Apple and Samsung supplier are separated with a relation syntax. You don’t have to manually create these text files, you can use the following helper functions

APPLE ONLY SUPPLIERS

apple_file = open('apple.txt','a')
for idx,row in apple_df.iterrows():
    supplier_name = row['Supplier']
    if supplier_name in sam_suppliers: continue
    short_address_name = row['Address']
    country_name = row['Countries']
    full_address_name = ""
    # params = {"companyName":'Samsung Electronics Inc.',"supplierName":supplier_name,"short_address":short_address_name,"country":country_name,"full_address":{full_address_name}}
    cypher_query = f"CREATE (:Company {{name:'Apple Inc.'}})<-[:supplier]-(:Supplier {{name:'{supplier_name}',short_address:'{short_address_name}',country:'{country_name}',full_address:'{full_address_name}'}})\n"
    apple_file.write(cypher_query)

SAMSUNG ONLY SUPPLIERS

samsung_file = open('samsung.txt','a')
for idx,row in sam_procesed.iterrows():
    supplier_name = row['Supplier']
    if supplier_name in apple_suppliers: continue
    short_address_name = row['Address']
    country_name = row['Countries']
    full_address_name = row['Full_Address']
    prop_1 = "name"

    # params = {"companyName":'Samsung Electronics Inc.',"supplierName":supplier_name,"short_address":short_address_name,"country":country_name,"full_address":{full_address_name}}
    cypher_query = f"CREATE (:Company {{name:'Samsung Electronics Inc.'}})<-[:supplier]-(:Supplier {{name:'{supplier_name}',short_address:'{short_address_name}',country:'{country_name}',full_address:'{full_address_name}'}})\n"
    samsung_file.write(cypher_query)

COMBINED

comb_file = open('comb.txt','a')
for idx,row in sam_procesed.iterrows():
    supplier_name = row['Supplier']
    if supplier_name not in apple_suppliers: continue
    short_address_name = row['Address']
    country_name = row['Countries']
    full_address_name = row['Full_Address']
    
    cypher_query = f"CREATE (:Company {{name:'Apple Inc.'}})<-[:supplier]-(:Supplier {{name:'{supplier_name}',short_address:'{short_address_name}',country:'{country_name}',full_address:'{full_address_name}'}})-[:supplier]->(:Company{{name:'Samsung Electronics Inc.'}})\n"
    comb_file.write(cypher_query)

Now, you can copy paste these to Neo4j console and create the database

call db.schema.visualization;

DATA EXPLORATION

Suppliers that are located in China who supply Apple

MATCH (c:Company{name:'Apple Inc.'})<-[sr:supplier]-
(s:Supplier {country:'China'}) RETURN c,s,sr limit 3;

India-based suppliers to Samsung Electronics

MATCH (c:Company{name:'Samsung Electronics Inc.'})<-
[sr:supplier]-(s:Supplier {country:'India'}) RETURN c,s,sr;

Common suppliers of Apple and Samsung based in China

MATCH (c:Company{name:'Samsung Electronics Inc.'})<-
[sr:supplier]-(s:Supplier {country:'China'})-[:supplier]-
>(c1:Company {name:'Apple Inc.'}) RETURN c1,c,s,sr limit 3;

What does Samsung supply to Apple?

MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Samsung' return
c,s,s.country;

REGIONS WHERE SAMSUNG SUPPLIES APPLE

MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Samsung' return
s.country,s.short_address;

WHO ARE THE DISPLAY PROVIDERS FOR APPLE AND SAMSUNG?

APPLE

MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Display' return
s.country,s.short_address,s.name;

SAMSUNG

MATCH (c:Company{name:'Samsung Electronics Inc.'})<-
[:supplier]-(s:Supplier) where s.name CONTAINS 'Display'
return s.country,s.short_address,s.name;

TSMC SUPPLY REGIONS

MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Taiwan Semiconductor'
return s.country,s.short_address;

Which country has the most suppliers for Apple and Samsung respectively?