Importing large SQL dump with millions of INSERT statements
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I need to import a big .sql
file (8.1GB when unpacked) into PostgreSQL. I tried to use i /path/to/file.sql
but it is way too slow.
How does one speed up the import? I need to import this data weekly.
The first 2000 lines can be found here, while the compressed 1 GB dump can be found here
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.5.3
-- Dumped by pg_dump version 9.5.2
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- Name: rpo; Type: SCHEMA; Schema: -; Owner: -
--
That's the only way I can get the data. The full file has approx. 38,000,000 lines. How can I speed up the import?
postgresql import postgresql-performance pg-dump
add a comment |
I need to import a big .sql
file (8.1GB when unpacked) into PostgreSQL. I tried to use i /path/to/file.sql
but it is way too slow.
How does one speed up the import? I need to import this data weekly.
The first 2000 lines can be found here, while the compressed 1 GB dump can be found here
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.5.3
-- Dumped by pg_dump version 9.5.2
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- Name: rpo; Type: SCHEMA; Schema: -; Owner: -
--
That's the only way I can get the data. The full file has approx. 38,000,000 lines. How can I speed up the import?
postgresql import postgresql-performance pg-dump
add a comment |
I need to import a big .sql
file (8.1GB when unpacked) into PostgreSQL. I tried to use i /path/to/file.sql
but it is way too slow.
How does one speed up the import? I need to import this data weekly.
The first 2000 lines can be found here, while the compressed 1 GB dump can be found here
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.5.3
-- Dumped by pg_dump version 9.5.2
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- Name: rpo; Type: SCHEMA; Schema: -; Owner: -
--
That's the only way I can get the data. The full file has approx. 38,000,000 lines. How can I speed up the import?
postgresql import postgresql-performance pg-dump
I need to import a big .sql
file (8.1GB when unpacked) into PostgreSQL. I tried to use i /path/to/file.sql
but it is way too slow.
How does one speed up the import? I need to import this data weekly.
The first 2000 lines can be found here, while the compressed 1 GB dump can be found here
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.5.3
-- Dumped by pg_dump version 9.5.2
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- Name: rpo; Type: SCHEMA; Schema: -; Owner: -
--
That's the only way I can get the data. The full file has approx. 38,000,000 lines. How can I speed up the import?
postgresql import postgresql-performance pg-dump
postgresql import postgresql-performance pg-dump
edited Aug 29 '18 at 21:47
Evan Carroll
33.7k1079235
33.7k1079235
asked Aug 29 '18 at 10:24
Ondrej VrabelOndrej Vrabel
1254
1254
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
This dump was dumped as individual statements (with pg_dump --inserts
)
INSERT INTO esa2010_codes VALUES (11002, 'Národn
INSERT INTO esa2010_codes VALUES (11003, 'Nefina
INSERT INTO esa2010_codes VALUES (12502, 'Národn
INSERT INTO esa2010_codes VALUES (11001, 'Verejn
INSERT INTO esa2010_codes VALUES (12602, 'Národn
INSERT INTO esa2010_codes VALUES (12603, 'Finanč
INSERT INTO esa2010_codes VALUES (12503, 'Ostatn
This is documented as being slow (from man pg_dump
)
--inserts
Dump data asINSERT
commands (rather thanCOPY
). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases. However, since this option generates a separate command for each row, an error in reloading a row causes only that row to be lost rather than the entire table contents. Note that the restore might fail altogether if you have rearranged column order. The--column-inserts
option is safe against column order changes, though even slower.
That's why it's so slow. What you're going to want to do is to turn off some of the durability settings, specifically synchronous_commit
, though fsync
will help too
You can do this very simply by running the following command before you run your i file.sql
.
SET synchronous_commit TO off;
That will do a lot to speed it up. Don't forget to turn back on the durability options after you're done. I bet it'll finish in a few hours, after you're set that. If you need more speed though don't hesitate to turn off fsync
and full_page_writes
on the cluster until you get the data up -- though I won't do it if the DB had data you needed in it, or if it was production. As a last note, if you need the speed and this is a production DB you can go all out on your own copy and dump it with the default options by pg_dump
, which you'll be able to load much faster.
add a comment |
Another option is running import in one transaction (if it is possible):
BEGIN;
i dump.sql
COMMIT
PostgreSQL is runnning in autocommit mode by default - it means every command is finished by commit - and commit is finished by fsync (and fsync is pretty slow). It can be reduced by asunchronnous commit (Evan Carroll's reply) or reduced to one by explicit transaction.
Other possibility is disabling check of referential integrity (if it is used). This variant is possible, because we can expect so dump is consistent and correct. You can see details to command ALTER TABLE xx DISABLE TRIGGER ALL
.
The source of your file is pg_dump
. The most simply speedup can be taken by using some option when dump is created.
Don't use option
--inserts
. Copy format is significantly faster for restoreUse option
--disable-triggers
to disable RI check (expect correct data)You can use custom format
-F
option. Then you can usepg_restore
for restoring and building indexes (most slow operation) parallel.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f216183%2fimporting-large-sql-dump-with-millions-of-insert-statements%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This dump was dumped as individual statements (with pg_dump --inserts
)
INSERT INTO esa2010_codes VALUES (11002, 'Národn
INSERT INTO esa2010_codes VALUES (11003, 'Nefina
INSERT INTO esa2010_codes VALUES (12502, 'Národn
INSERT INTO esa2010_codes VALUES (11001, 'Verejn
INSERT INTO esa2010_codes VALUES (12602, 'Národn
INSERT INTO esa2010_codes VALUES (12603, 'Finanč
INSERT INTO esa2010_codes VALUES (12503, 'Ostatn
This is documented as being slow (from man pg_dump
)
--inserts
Dump data asINSERT
commands (rather thanCOPY
). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases. However, since this option generates a separate command for each row, an error in reloading a row causes only that row to be lost rather than the entire table contents. Note that the restore might fail altogether if you have rearranged column order. The--column-inserts
option is safe against column order changes, though even slower.
That's why it's so slow. What you're going to want to do is to turn off some of the durability settings, specifically synchronous_commit
, though fsync
will help too
You can do this very simply by running the following command before you run your i file.sql
.
SET synchronous_commit TO off;
That will do a lot to speed it up. Don't forget to turn back on the durability options after you're done. I bet it'll finish in a few hours, after you're set that. If you need more speed though don't hesitate to turn off fsync
and full_page_writes
on the cluster until you get the data up -- though I won't do it if the DB had data you needed in it, or if it was production. As a last note, if you need the speed and this is a production DB you can go all out on your own copy and dump it with the default options by pg_dump
, which you'll be able to load much faster.
add a comment |
This dump was dumped as individual statements (with pg_dump --inserts
)
INSERT INTO esa2010_codes VALUES (11002, 'Národn
INSERT INTO esa2010_codes VALUES (11003, 'Nefina
INSERT INTO esa2010_codes VALUES (12502, 'Národn
INSERT INTO esa2010_codes VALUES (11001, 'Verejn
INSERT INTO esa2010_codes VALUES (12602, 'Národn
INSERT INTO esa2010_codes VALUES (12603, 'Finanč
INSERT INTO esa2010_codes VALUES (12503, 'Ostatn
This is documented as being slow (from man pg_dump
)
--inserts
Dump data asINSERT
commands (rather thanCOPY
). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases. However, since this option generates a separate command for each row, an error in reloading a row causes only that row to be lost rather than the entire table contents. Note that the restore might fail altogether if you have rearranged column order. The--column-inserts
option is safe against column order changes, though even slower.
That's why it's so slow. What you're going to want to do is to turn off some of the durability settings, specifically synchronous_commit
, though fsync
will help too
You can do this very simply by running the following command before you run your i file.sql
.
SET synchronous_commit TO off;
That will do a lot to speed it up. Don't forget to turn back on the durability options after you're done. I bet it'll finish in a few hours, after you're set that. If you need more speed though don't hesitate to turn off fsync
and full_page_writes
on the cluster until you get the data up -- though I won't do it if the DB had data you needed in it, or if it was production. As a last note, if you need the speed and this is a production DB you can go all out on your own copy and dump it with the default options by pg_dump
, which you'll be able to load much faster.
add a comment |
This dump was dumped as individual statements (with pg_dump --inserts
)
INSERT INTO esa2010_codes VALUES (11002, 'Národn
INSERT INTO esa2010_codes VALUES (11003, 'Nefina
INSERT INTO esa2010_codes VALUES (12502, 'Národn
INSERT INTO esa2010_codes VALUES (11001, 'Verejn
INSERT INTO esa2010_codes VALUES (12602, 'Národn
INSERT INTO esa2010_codes VALUES (12603, 'Finanč
INSERT INTO esa2010_codes VALUES (12503, 'Ostatn
This is documented as being slow (from man pg_dump
)
--inserts
Dump data asINSERT
commands (rather thanCOPY
). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases. However, since this option generates a separate command for each row, an error in reloading a row causes only that row to be lost rather than the entire table contents. Note that the restore might fail altogether if you have rearranged column order. The--column-inserts
option is safe against column order changes, though even slower.
That's why it's so slow. What you're going to want to do is to turn off some of the durability settings, specifically synchronous_commit
, though fsync
will help too
You can do this very simply by running the following command before you run your i file.sql
.
SET synchronous_commit TO off;
That will do a lot to speed it up. Don't forget to turn back on the durability options after you're done. I bet it'll finish in a few hours, after you're set that. If you need more speed though don't hesitate to turn off fsync
and full_page_writes
on the cluster until you get the data up -- though I won't do it if the DB had data you needed in it, or if it was production. As a last note, if you need the speed and this is a production DB you can go all out on your own copy and dump it with the default options by pg_dump
, which you'll be able to load much faster.
This dump was dumped as individual statements (with pg_dump --inserts
)
INSERT INTO esa2010_codes VALUES (11002, 'Národn
INSERT INTO esa2010_codes VALUES (11003, 'Nefina
INSERT INTO esa2010_codes VALUES (12502, 'Národn
INSERT INTO esa2010_codes VALUES (11001, 'Verejn
INSERT INTO esa2010_codes VALUES (12602, 'Národn
INSERT INTO esa2010_codes VALUES (12603, 'Finanč
INSERT INTO esa2010_codes VALUES (12503, 'Ostatn
This is documented as being slow (from man pg_dump
)
--inserts
Dump data asINSERT
commands (rather thanCOPY
). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases. However, since this option generates a separate command for each row, an error in reloading a row causes only that row to be lost rather than the entire table contents. Note that the restore might fail altogether if you have rearranged column order. The--column-inserts
option is safe against column order changes, though even slower.
That's why it's so slow. What you're going to want to do is to turn off some of the durability settings, specifically synchronous_commit
, though fsync
will help too
You can do this very simply by running the following command before you run your i file.sql
.
SET synchronous_commit TO off;
That will do a lot to speed it up. Don't forget to turn back on the durability options after you're done. I bet it'll finish in a few hours, after you're set that. If you need more speed though don't hesitate to turn off fsync
and full_page_writes
on the cluster until you get the data up -- though I won't do it if the DB had data you needed in it, or if it was production. As a last note, if you need the speed and this is a production DB you can go all out on your own copy and dump it with the default options by pg_dump
, which you'll be able to load much faster.
edited Aug 29 '18 at 21:45
answered Aug 29 '18 at 10:50
Evan CarrollEvan Carroll
33.7k1079235
33.7k1079235
add a comment |
add a comment |
Another option is running import in one transaction (if it is possible):
BEGIN;
i dump.sql
COMMIT
PostgreSQL is runnning in autocommit mode by default - it means every command is finished by commit - and commit is finished by fsync (and fsync is pretty slow). It can be reduced by asunchronnous commit (Evan Carroll's reply) or reduced to one by explicit transaction.
Other possibility is disabling check of referential integrity (if it is used). This variant is possible, because we can expect so dump is consistent and correct. You can see details to command ALTER TABLE xx DISABLE TRIGGER ALL
.
The source of your file is pg_dump
. The most simply speedup can be taken by using some option when dump is created.
Don't use option
--inserts
. Copy format is significantly faster for restoreUse option
--disable-triggers
to disable RI check (expect correct data)You can use custom format
-F
option. Then you can usepg_restore
for restoring and building indexes (most slow operation) parallel.
add a comment |
Another option is running import in one transaction (if it is possible):
BEGIN;
i dump.sql
COMMIT
PostgreSQL is runnning in autocommit mode by default - it means every command is finished by commit - and commit is finished by fsync (and fsync is pretty slow). It can be reduced by asunchronnous commit (Evan Carroll's reply) or reduced to one by explicit transaction.
Other possibility is disabling check of referential integrity (if it is used). This variant is possible, because we can expect so dump is consistent and correct. You can see details to command ALTER TABLE xx DISABLE TRIGGER ALL
.
The source of your file is pg_dump
. The most simply speedup can be taken by using some option when dump is created.
Don't use option
--inserts
. Copy format is significantly faster for restoreUse option
--disable-triggers
to disable RI check (expect correct data)You can use custom format
-F
option. Then you can usepg_restore
for restoring and building indexes (most slow operation) parallel.
add a comment |
Another option is running import in one transaction (if it is possible):
BEGIN;
i dump.sql
COMMIT
PostgreSQL is runnning in autocommit mode by default - it means every command is finished by commit - and commit is finished by fsync (and fsync is pretty slow). It can be reduced by asunchronnous commit (Evan Carroll's reply) or reduced to one by explicit transaction.
Other possibility is disabling check of referential integrity (if it is used). This variant is possible, because we can expect so dump is consistent and correct. You can see details to command ALTER TABLE xx DISABLE TRIGGER ALL
.
The source of your file is pg_dump
. The most simply speedup can be taken by using some option when dump is created.
Don't use option
--inserts
. Copy format is significantly faster for restoreUse option
--disable-triggers
to disable RI check (expect correct data)You can use custom format
-F
option. Then you can usepg_restore
for restoring and building indexes (most slow operation) parallel.
Another option is running import in one transaction (if it is possible):
BEGIN;
i dump.sql
COMMIT
PostgreSQL is runnning in autocommit mode by default - it means every command is finished by commit - and commit is finished by fsync (and fsync is pretty slow). It can be reduced by asunchronnous commit (Evan Carroll's reply) or reduced to one by explicit transaction.
Other possibility is disabling check of referential integrity (if it is used). This variant is possible, because we can expect so dump is consistent and correct. You can see details to command ALTER TABLE xx DISABLE TRIGGER ALL
.
The source of your file is pg_dump
. The most simply speedup can be taken by using some option when dump is created.
Don't use option
--inserts
. Copy format is significantly faster for restoreUse option
--disable-triggers
to disable RI check (expect correct data)You can use custom format
-F
option. Then you can usepg_restore
for restoring and building indexes (most slow operation) parallel.
answered Aug 31 '18 at 5:45
Pavel StehulePavel Stehule
65147
65147
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f216183%2fimporting-large-sql-dump-with-millions-of-insert-statements%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown