Neo4j: Analyzing the supplier's list of Apple and Samsung

Astarag Mohapatra
6 min readFeb 3, 2024

In this article, we will review Neo4j basics by getting data about Apple and Samsung supplier lists. We are analyzing the supplier list of Apple and Samsung to compare what common suppliers they have and where they are located.

You can follow the tutorial here in the ipynb notebook

We started with the Apple supplier list from here and for Samsung, we scraped data from here. Here we are comparing the supplier location for both of these electronics companies and where are they located. We have different fields like the address name, country name and supplier name. We had to manually clean the data for some integrity.

DATA COLLECTION PIPELINE

The Apple supplier list was a PDF, so to scrape the relevant fields, we used Pymudf and with below code

from unstructured.partition.auto import partition
elements = partition("Apple-Supplier-List.pdf",include_page_breaks=True)

companies = []
places = []
countries = []

apple_suppliers_dict = {}
num_places = 1
num_countries = 1

for idx,elem in enumerate(elements):
elem_dict = elem.to_dict()
if elem_dict['type']=='PageBreak':continue
top_left = elem_dict['metadata']['coordinates']['points'][0]

top_left_x = top_left[0]
#Supplier name in first column
if 35<top_left_x<37:
company_text = elem_dict['text']
if company_text=="SUPPLIER NAME":continue
companies.append(company_text)
apple_suppliers_dict.update({company_text:{"places":[],"countries":[]}})
num_places = 1
num_countries = 1
# Location name in second column
elif 469<top_left_x<471:
place_text = elem_dict['text']
if place_text=='PRIMARY LOCATIONS WHERE MANUFACTURING FOR APPLE OCCURS': continue
places.append(place_text)
apple_suppliers_dict[company_text]['places'].append((place_text,num_places))
num_places+=1
# Country name in third column
elif 889<top_left_x<891:
country_text = elem_dict['text']
countries.append(country_text)
apple_suppliers_dict[company_text]['countries'].append((country_text,num_countries))
num_countries+=1

There was some missing data on the country name, so I had to do a manual adjustment

apple_suppliers_dict['Murata Manufacturing Company Limited'] = {'places': [('Guangdong, Jiangsu', 1),
('Fukui, Fukushima, Ishikawa, Miyagi, Miyazaki, Nagano, Okayama,Shiga, Shimane, Toyama', 2),
('Perak', 3),
('Singapore', 4),
('Taiwan', 5),
('Lamphun', 6),
('Da Nang, Tien Giang', 7)],
'countries': [('China mainland', 1),
('Japan', 2),
('Malaysia', 3),
('Singapore', 4),
('Taiwan', 5),
('Thailand', 6),
('Vietnam', 7)]}

faulty_dict = {}
for comp, place in apple_suppliers_dict.items():
if len(place['places'])!=len(place['countries']):
print(comp)
faulty_dict = apple_suppliers_dict[comp]
#No faulty dict found now

Preparing the final data for Apple

import pandas as pd
apple_df = pd.DataFrame(columns=['Supplier','Address','Countries'])
apple_comp = []
apple_places = []
apple_countries = []
for comp, place in apple_suppliers_dict.items():
for pl,vals in place.items():
if pl=="places":
for val in vals:
apple_comp.append(comp)
apple_places.append(",".join(val[0].split(",")))
elif pl=="countries":
for val in vals:
apple_countries.append(val[0])
apple_df['Supplier'] = apple_comp
apple_df['Address'] = apple_places
apple_df['Countries'] = apple_countries
Apple Suppliers DataFrame

For Samsung, we had a pandas data frame

To separate the country names and Addresses, we use the following helper functions

import re
def get_country(addr:str):
addr = addr.split(",")[-1]
addr = re.sub("\n","",addr)
return addr.strip()
def remove_country(addr:str):
addr = ",".join(addr.split(",")[:-1])
addr = re.sub("\n"," ",addr)
return addr.strip()
sam_df['Countries'] = sam_df['Address'].apply(get_country)
sam_df['Address'] = sam_df['Address'].apply(remove_country)

THE TWO CSV FILES USED FOR THIS HOMEWORK ARE BELOW (Google Drive links)

  1. Apple Supplier List
  2. Samsung Supplier List

CREATING THE DATABASE

FROM ARROW TOOL

The arrow tool relation for writing the schema in Neo4j

THE ABOVE SCHEMA IS A ROUGH SKETCH AND IN THE DATABASE, WE HAVE JUST CHANGED THE PROPERTY NAME LIKE `Company Name` BECAME name

A sample data insertion to Neo4j would look like the following

CREATE (apple:Company {name:'Apple 
Inc.',headquarter:'Cupertino
California',sector:'Technology'}),(samsung:Company{name:'
Samsung Electronics Inc.',headquarter:'Suwon-si South
Korea',sector:’Technology’})

We have three primary nodes Apple Inc., Samsung Electronics Inc. and Suppliers company name, which has multiple nodes. The relationship between the supplier and company is named supplier and each supplier node has properties on address and country

THE DATA INSERTION CODE IS VERY LONG, FOR BREVITY IN THE REPORT, PLEASE REFER TO THESE GOOGLE DRIVE LINKS FOR THE CODE USED

  1. Apple Only data: (Example: CREATE (:Company {name:’Apple Inc.’})<-[:supplier]-(:Supplier {name:’3M’,short_address:’Alabama, Indiana, Iowa, Minnesota, Ohio, South Carolina, Wisconsin’,country:’United States’,full_address:’’}))
  2. Samsung only data: (Example:CREATE (:Company {name:’Samsung Electronics Inc.’})<-[:supplier]-(:Supplier {name:’Adeka Corporation’,short_address:’Jeollabuk-Do’,country:’Korea’,full_address:’Yuhang Rd, Shuyang Economic Develop Suqian City, Jiangsu Province’}))
  3. Both Apple and Samsung Supplier Data: (Example: CREATE (:Company {name:’Apple Inc.’})<-[:supplier]-(:Supplier {name:’AAC Acoustic Technologies Holdings Incorporated’,short_address:’Jiangsu’,country:’China’,full_address:’213167 №3 Changcao Road, Hi-Tech Industrial Zone, Wujin, Changzhou City, Jiangsu Province’})-[:supplier]->(:Company{name:’Samsung Electronics Inc.’}))

You can see in the common Apple and Samsung supplier are separated with a relation syntax. You don’t have to manually create these text files, you can use the following helper functions

APPLE ONLY SUPPLIERS

apple_file = open('apple.txt','a')
for idx,row in apple_df.iterrows():
supplier_name = row['Supplier']
if supplier_name in sam_suppliers: continue
short_address_name = row['Address']
country_name = row['Countries']
full_address_name = ""
# params = {"companyName":'Samsung Electronics Inc.',"supplierName":supplier_name,"short_address":short_address_name,"country":country_name,"full_address":{full_address_name}}
cypher_query = f"CREATE (:Company {{name:'Apple Inc.'}})<-[:supplier]-(:Supplier {{name:'{supplier_name}',short_address:'{short_address_name}',country:'{country_name}',full_address:'{full_address_name}'}})\n"
apple_file.write(cypher_query)

SAMSUNG ONLY SUPPLIERS

samsung_file = open('samsung.txt','a')
for idx,row in sam_procesed.iterrows():
supplier_name = row['Supplier']
if supplier_name in apple_suppliers: continue
short_address_name = row['Address']
country_name = row['Countries']
full_address_name = row['Full_Address']
prop_1 = "name"

# params = {"companyName":'Samsung Electronics Inc.',"supplierName":supplier_name,"short_address":short_address_name,"country":country_name,"full_address":{full_address_name}}
cypher_query = f"CREATE (:Company {{name:'Samsung Electronics Inc.'}})<-[:supplier]-(:Supplier {{name:'{supplier_name}',short_address:'{short_address_name}',country:'{country_name}',full_address:'{full_address_name}'}})\n"
samsung_file.write(cypher_query)

COMBINED

comb_file = open('comb.txt','a')
for idx,row in sam_procesed.iterrows():
supplier_name = row['Supplier']
if supplier_name not in apple_suppliers: continue
short_address_name = row['Address']
country_name = row['Countries']
full_address_name = row['Full_Address']

cypher_query = f"CREATE (:Company {{name:'Apple Inc.'}})<-[:supplier]-(:Supplier {{name:'{supplier_name}',short_address:'{short_address_name}',country:'{country_name}',full_address:'{full_address_name}'}})-[:supplier]->(:Company{{name:'Samsung Electronics Inc.'}})\n"
comb_file.write(cypher_query)

Now, you can copy paste these to Neo4j console and create the database

call db.schema.visualization;
Database Schema

DATA EXPLORATION

  • Suppliers that are located in China who supply Apple
MATCH (c:Company{name:'Apple Inc.'})<-[sr:supplier]-
(s:Supplier {country:'China'}) RETURN c,s,sr limit 3;
  • India-based suppliers to Samsung Electronics
MATCH (c:Company{name:'Samsung Electronics Inc.'})<-
[sr:supplier]-(s:Supplier {country:'India'}) RETURN c,s,sr;
  • Common suppliers of Apple and Samsung based in China
MATCH (c:Company{name:'Samsung Electronics Inc.'})<-
[sr:supplier]-(s:Supplier {country:'China'})-[:supplier]-
>(c1:Company {name:'Apple Inc.'}) RETURN c1,c,s,sr limit 3;
  • What does Samsung supply to Apple?
MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Samsung' return
c,s,s.country;
  • REGIONS WHERE SAMSUNG SUPPLIES APPLE
MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Samsung' return
s.country,s.short_address;
  • WHO ARE THE DISPLAY PROVIDERS FOR APPLE AND SAMSUNG?

APPLE

MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Display' return
s.country,s.short_address,s.name;

SAMSUNG

MATCH (c:Company{name:'Samsung Electronics Inc.'})<-
[:supplier]-(s:Supplier) where s.name CONTAINS 'Display'
return s.country,s.short_address,s.name;
  • TSMC SUPPLY REGIONS
MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) where s.name CONTAINS 'Taiwan Semiconductor'
return s.country,s.short_address;
  • Which country has the most suppliers for Apple and Samsung respectively?

APPLE


MATCH (c:Company{name:'Apple Inc.'})<-[:supplier]-
(s:Supplier) return COUNT(*) AS `Suppliers Count`,s.country
order by `Suppliers Count` desc;

SAMSUNG

MATCH (c:Company{name:'Samsung Electronics Inc.'})<-
[:supplier]-(s:Supplier) return COUNT(*) AS `Suppliers
Count`,s.country order by `Suppliers Count` desc;

This brings us to the end of the tutorial on Neo4j, where we demonstrated an end-to-end pipeline for data exploration of the Apple and Samsung suppliers list. If you have any queries, I am one comment away.

--

--

Astarag Mohapatra

Hi Astarag here, I am interested in topics about Deep learning and other topics. If you have any queries I am one comment away