DB 별 TPC-H 로드 방법

IT_Engineer/Data Base|2023. 10. 19. 10:54
728x90

# 참고 페이지

1. Mysql


2. Hive:


3. Oracle:

 

>  로드(데이터 적재)
- TPC-H 데이터베이스 성능 측정 : https://ycseo.tistory.com/21
- TPC-H #1 . 준비 : https://www.haedongg.net/?p=355

> 확인
- tablespace에 속한 table 조회(owner) :  
 
- 오라클 테이블 스페이스 사용법(조회, 생성, 삭제)등 총정리 :  

- TableSpace 용량 및 관련내용 조회  

4. Sybase:

## 쿼리 가이드

-- Query 21)
        select s_name, n_name,
        count(*) as numwait
        from supplier, lineitem l1, orders, nation
        where s_suppkey = l1.l_suppkey
        and o_orderkey = l1.l_orderkey
        and o_orderstatus = 'F'
        and l1.l_receiptdate > l1.l_commitdate
        and exists ( select *
        from lineitem l2
        where l2.l_orderkey = l1.l_orderkey
        and l2.l_suppkey <> l1.l_suppkey)
        and s_nationkey = n_nationkey
        and n_name = 'SAUDI ARABIA'
        group by s_name,n_name

     
-- Query 22)     
          select cntrycode,
          count(*) as numcust,
          sum(c_acctbal) as totacctbal
          from ( select substring (c_phone ,1,2) as cntrycode,
          c_acctbal
          from customer
          where substring (c_phone ,1,2) in('13', '31', '23', '29', '30', '18', '17')
          and c_acctbal > ( select avg(c_acctbal)
          from customer
          where c_acctbal > 0.00
          and substring (c_phone ,1,2) in('13','31', '23', '29', '30', '18', '17') )
          and not exists ( select *
          from orders
          where o_custkey = c_custkey) ) as custsale
          group by cntrycode
          order by cntrycode
 

- tbl 만들고 (dbgen -s 10) -> 10G DB 생성

[oracle@oracle-server tpch]$ ls -al
합계 10968908
drwxr-xr-x  2 root root        156  2월 16 15:49 .
drwxrwxr-x+ 6 root root         56  2월 16 15:47 ..
-rw-r--r--  1 root root  244847642  2월 16 15:39 customer.tbl
-rw-r--r--  1 root root 7775727688  2월 16 15:39 lineitem.tbl
-rw-r--r--  1 root root       2224  2월 16 15:39 nation.tbl
-rw-r--r--  1 root root 1749195031  2월 16 15:39 orders.tbl
-rw-r--r--  1 root root  243336157  2월 16 15:39 part.tbl
-rw-r--r--  1 root root 1204850769  2월 16 15:39 partsupp.tbl
-rw-r--r--  1 root root        389  2월 16 15:39 region.tbl
-rw-r--r--  1 root root   14176368  2월 16 15:39 supplier.tbl

[oracle@oracle-server tpch]$ sqlplus tpch/tpch
SQL*Plus: Release 11.2.0.1.0 Production on Wed Feb 16 15:52:23 2022
Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
-- CREATE TABLE tpch.ext_part
-- (
--     p_partkey       NUMBER(10, 0),
--     p_name          VARCHAR2(55),
--     p_mfgr          CHAR(25),
--     p_brand         CHAR(10),
--     p_type          VARCHAR2(25),
--     p_size          INTEGER,
--     p_container     CHAR(10),
--     p_retailprice   NUMBER,
--     p_comment       VARCHAR2(23)
-- )
-- ORGANIZATION EXTERNAL
--     (TYPE oracle_loader
--           DEFAULT DIRECTORY tpch_dir
--               ACCESS PARAMETERS (
--                   FIELDS
--                       TERMINATED BY '|'
--                   MISSING FIELD VALUES ARE NULL
--               )
--           LOCATION('part.tbl'));

DROP TABLE tpch.ext_part CASCADE CONSTRAINTS;
DROP TABLE tpch.ext_supplier CASCADE CONSTRAINTS;
DROP TABLE tpch.ext_partsupp CASCADE CONSTRAINTS;
DROP TABLE tpch.ext_customer CASCADE CONSTRAINTS;
DROP TABLE tpch.ext_orders CASCADE CONSTRAINTS;
DROP TABLE tpch.ext_lineitem CASCADE CONSTRAINTS;
DROP TABLE tpch.ext_nation CASCADE CONSTRAINTS;
DROP TABLE tpch.ext_region CASCADE CONSTRAINTS;

DROP TABLE tpch.part CASCADE CONSTRAINTS;
DROP TABLE tpch.supplier CASCADE CONSTRAINTS;
DROP TABLE tpch.partsupp CASCADE CONSTRAINTS;
DROP TABLE tpch.customer CASCADE CONSTRAINTS;
DROP TABLE tpch.orders CASCADE CONSTRAINTS;
DROP TABLE tpch.lineitem CASCADE CONSTRAINTS;
DROP TABLE tpch.nation CASCADE CONSTRAINTS;
DROP TABLE tpch.region CASCADE CONSTRAINTS;


CREATE OR REPLACE DIRECTORY TPCH_DIR AS '/home/oracle/tpch/tpch/TPC-H_Tools_v3.0.0/dbgen/oracle20g';

CREATE TABLE tpch.ext_part
(
    p_partkey       NUMBER(10, 0),
    p_name          VARCHAR2(55),
    p_mfgr          CHAR(25),
    p_brand         CHAR(10),
    p_type          VARCHAR2(25),
    p_size          INTEGER,
    p_container     CHAR(10),
    p_retailprice   NUMBER,
    p_comment       VARCHAR2(23)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('part.tbl'));


CREATE TABLE tpch.part
(
    p_partkey       NUMBER(10, 0) NOT NULL,
    p_name          VARCHAR2(55) NOT NULL,
    p_mfgr          CHAR(25) NOT NULL,
    p_brand         CHAR(10) NOT NULL,
    p_type          VARCHAR2(25) NOT NULL,
    p_size          INTEGER NOT NULL,
    p_container     CHAR(10) NOT NULL,
    p_retailprice   NUMBER NOT NULL,
    p_comment       VARCHAR2(23) NOT NULL
);


CREATE TABLE tpch.ext_supplier
(
    s_suppkey     NUMBER(10, 0),
    s_name        CHAR(25),
    s_address     VARCHAR2(40),
    s_nationkey   NUMBER(10, 0),
    s_phone       CHAR(15),
    s_acctbal     NUMBER,
    s_comment     VARCHAR2(101)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('supplier.tbl'));


CREATE TABLE tpch.supplier
(
    s_suppkey     NUMBER(10, 0) NOT NULL,
    s_name        CHAR(25) NOT NULL,
    s_address     VARCHAR2(40) NOT NULL,
    s_nationkey   NUMBER(10, 0) NOT NULL,
    s_phone       CHAR(15) NOT NULL,
    s_acctbal     NUMBER NOT NULL,
    s_comment     VARCHAR2(101) NOT NULL
);

CREATE TABLE tpch.ext_partsupp
(
    ps_partkey      NUMBER(10, 0),
    ps_suppkey      NUMBER(10, 0),
    ps_availqty     INTEGER,
    ps_supplycost   NUMBER,
    ps_comment      VARCHAR2(199)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('partsupp.tbl'));

CREATE TABLE tpch.partsupp
(
    ps_partkey      NUMBER(10, 0) NOT NULL,
    ps_suppkey      NUMBER(10, 0) NOT NULL,
    ps_availqty     INTEGER NOT NULL,
    ps_supplycost   NUMBER NOT NULL,
    ps_comment      VARCHAR2(199) NOT NULL
);


CREATE TABLE tpch.ext_customer
(
    c_custkey      NUMBER(10, 0),
    c_name         VARCHAR2(25),
    c_address      VARCHAR2(40),
    c_nationkey    NUMBER(10, 0),
    c_phone        CHAR(15),
    c_acctbal      NUMBER,
    c_mktsegment   CHAR(10),
    c_comment      VARCHAR2(117)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('customer.tbl'));

CREATE TABLE tpch.customer
(
    c_custkey      NUMBER(10, 0) NOT NULL,
    c_name         VARCHAR2(25) NOT NULL,
    c_address      VARCHAR2(40) NOT NULL,
    c_nationkey    NUMBER(10, 0) NOT NULL,
    c_phone        CHAR(15) NOT NULL,
    c_acctbal      NUMBER NOT NULL,
    c_mktsegment   CHAR(10) NOT NULL,
    c_comment      VARCHAR2(117) NOT NULL
);

CREATE TABLE tpch.ext_orders
(
    o_orderkey        NUMBER(10, 0),
    o_custkey         NUMBER(10, 0),
    o_orderstatus     CHAR(1),
    o_totalprice      NUMBER,
    o_orderdate       CHAR(10),
    o_orderpriority   CHAR(15),
    o_clerk           CHAR(15),
    o_shippriority    INTEGER,
    o_comment         VARCHAR2(79)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('orders.tbl'));

CREATE TABLE tpch.orders
(
    o_orderkey        NUMBER(10, 0) NOT NULL,
    o_custkey         NUMBER(10, 0) NOT NULL,
    o_orderstatus     CHAR(1) NOT NULL,
    o_totalprice      NUMBER NOT NULL,
    o_orderdate       DATE NOT NULL,
    o_orderpriority   CHAR(15) NOT NULL,
    o_clerk           CHAR(15) NOT NULL,
    o_shippriority    INTEGER NOT NULL,
    o_comment         VARCHAR2(79) NOT NULL
);

CREATE TABLE tpch.ext_lineitem
(
    l_orderkey        NUMBER(10, 0),
    l_partkey         NUMBER(10, 0),
    l_suppkey         NUMBER(10, 0),
    l_linenumber      INTEGER,
    l_quantity        NUMBER,
    l_extendedprice   NUMBER,
    l_discount        NUMBER,
    l_tax             NUMBER,
    l_returnflag      CHAR(1),
    l_linestatus      CHAR(1),
    l_shipdate        CHAR(10),
    l_commitdate      CHAR(10),
    l_receiptdate     CHAR(10),
    l_shipinstruct    CHAR(25),
    l_shipmode        CHAR(10),
    l_comment         VARCHAR2(44)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('lineitem.tbl'));


CREATE TABLE tpch.lineitem
(
    l_orderkey        NUMBER(10, 0),
    l_partkey         NUMBER(10, 0),
    l_suppkey         NUMBER(10, 0),
    l_linenumber      INTEGER,
    l_quantity        NUMBER,
    l_extendedprice   NUMBER,
    l_discount        NUMBER,
    l_tax             NUMBER,
    l_returnflag      CHAR(1),
    l_linestatus      CHAR(1),
    l_shipdate        DATE,
    l_commitdate      DATE,
    l_receiptdate     DATE,
    l_shipinstruct    CHAR(25),
    l_shipmode        CHAR(10),
    l_comment         VARCHAR2(44)
);


CREATE TABLE tpch.ext_nation
(
    n_nationkey   NUMBER(10, 0),
    n_name        CHAR(25),
    n_regionkey   NUMBER(10, 0),
    n_comment     VARCHAR(152)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('nation.tbl'));

CREATE TABLE tpch.nation
(
    n_nationkey   NUMBER(10, 0),
    n_name        CHAR(25),
    n_regionkey   NUMBER(10, 0),
    n_comment     VARCHAR(152)
);

CREATE TABLE tpch.ext_region
(
    r_regionkey   NUMBER(10, 0),
    r_name        CHAR(25),
    r_comment     VARCHAR(152)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('region.tbl'));

CREATE TABLE tpch.region
(
    r_regionkey   NUMBER(10, 0),
    r_name        CHAR(25),
    r_comment     VARCHAR(152)
);


TRUNCATE TABLE tpch.part;
TRUNCATE TABLE tpch.supplier;
TRUNCATE TABLE tpch.partsupp;
TRUNCATE TABLE tpch.customer;
TRUNCATE TABLE tpch.orders;
TRUNCATE TABLE tpch.lineitem;
TRUNCATE TABLE tpch.nation;
TRUNCATE TABLE tpch.region;

ALTER SESSION SET nls_date_format='YYYY-MM-DD';

INSERT /*+ APPEND */ INTO  tpch.part     SELECT * FROM tpch.ext_part;
INSERT /*+ APPEND */ INTO  tpch.supplier SELECT * FROM tpch.ext_supplier;
INSERT /*+ APPEND */ INTO  tpch.partsupp SELECT * FROM tpch.ext_partsupp;
INSERT /*+ APPEND */ INTO  tpch.customer SELECT * FROM tpch.ext_customer;
INSERT /*+ APPEND */ INTO  tpch.orders   SELECT * FROM tpch.ext_orders;
INSERT /*+ APPEND */ INTO  tpch.lineitem SELECT * FROM tpch.ext_lineitem;
INSERT /*+ APPEND */ INTO  tpch.nation   SELECT * FROM tpch.ext_nation;
INSERT /*+ APPEND */ INTO  tpch.region   SELECT * FROM tpch.ext_region;


--- ORACLE 30G

CREATE TABLE G1.ext_part
(
    p_partkey       NUMBER(10, 0),
    p_name          VARCHAR2(55),
    p_mfgr          CHAR(25),
    p_brand         CHAR(10),
    p_type          VARCHAR2(25),
    p_size          INTEGER,
    p_container     CHAR(10),
    p_retailprice   NUMBER,
    p_comment       VARCHAR2(23)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('part.tbl'));

CREATE TABLE g1.part
(
    p_partkey       NUMBER(10, 0) NOT NULL,
    p_name          VARCHAR2(55) NOT NULL,
    p_mfgr          CHAR(25) NOT NULL,
    p_brand         CHAR(10) NOT NULL,
    p_type          VARCHAR2(25) NOT NULL,
    p_size          INTEGER NOT NULL,
    p_container     CHAR(10) NOT NULL,
    p_retailprice   NUMBER NOT NULL,
    p_comment       VARCHAR2(23) NOT NULL
);


CREATE TABLE g1.ext_supplier
(
    s_suppkey     NUMBER(10, 0),
    s_name        CHAR(25),
    s_address     VARCHAR2(40),
    s_nationkey   NUMBER(10, 0),
    s_phone       CHAR(15),
    s_acctbal     NUMBER,
    s_comment     VARCHAR2(101)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('supplier.tbl'));

CREATE TABLE g1.supplier
(
    s_suppkey     NUMBER(10, 0) NOT NULL,
    s_name        CHAR(25) NOT NULL,
    s_address     VARCHAR2(40) NOT NULL,
    s_nationkey   NUMBER(10, 0) NOT NULL,
    s_phone       CHAR(15) NOT NULL,
    s_acctbal     NUMBER NOT NULL,
    s_comment     VARCHAR2(101) NOT NULL
);

CREATE TABLE g1.ext_partsupp
(
    ps_partkey      NUMBER(10, 0),
    ps_suppkey      NUMBER(10, 0),
    ps_availqty     INTEGER,
    ps_supplycost   NUMBER,
    ps_comment      VARCHAR2(199)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('partsupp.tbl'));

CREATE TABLE g1.partsupp
(
    ps_partkey      NUMBER(10, 0) NOT NULL,
    ps_suppkey      NUMBER(10, 0) NOT NULL,
    ps_availqty     INTEGER NOT NULL,
    ps_supplycost   NUMBER NOT NULL,
    ps_comment      VARCHAR2(199) NOT NULL
);

CREATE TABLE g1.ext_customer
(
    c_custkey      NUMBER(10, 0),
    c_name         VARCHAR2(25),
    c_address      VARCHAR2(40),
    c_nationkey    NUMBER(10, 0),
    c_phone        CHAR(15),
    c_acctbal      NUMBER,
    c_mktsegment   CHAR(10),
    c_comment      VARCHAR2(117)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('customer.tbl'));

CREATE TABLE g1.customer
(
    c_custkey      NUMBER(10, 0) NOT NULL,
    c_name         VARCHAR2(25) NOT NULL,
    c_address      VARCHAR2(40) NOT NULL,
    c_nationkey    NUMBER(10, 0) NOT NULL,
    c_phone        CHAR(15) NOT NULL,
    c_acctbal      NUMBER NOT NULL,
    c_mktsegment   CHAR(10) NOT NULL,
    c_comment      VARCHAR2(117) NOT NULL
);

-- read date values as yyyy-mm-dd text

CREATE TABLE g1.ext_orders
(
    o_orderkey        NUMBER(10, 0),
    o_custkey         NUMBER(10, 0),
    o_orderstatus     CHAR(1),
    o_totalprice      NUMBER,
    o_orderdate       CHAR(10),
    o_orderpriority   CHAR(15),
    o_clerk           CHAR(15),
    o_shippriority    INTEGER,
    o_comment         VARCHAR2(79)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('orders.tbl'));

CREATE TABLE g1.orders
(
    o_orderkey        NUMBER(10, 0) NOT NULL,
    o_custkey         NUMBER(10, 0) NOT NULL,
    o_orderstatus     CHAR(1) NOT NULL,
    o_totalprice      NUMBER NOT NULL,
    o_orderdate       DATE NOT NULL,
    o_orderpriority   CHAR(15) NOT NULL,
    o_clerk           CHAR(15) NOT NULL,
    o_shippriority    INTEGER NOT NULL,
    o_comment         VARCHAR2(79) NOT NULL
);

-- read date values as yyyy-mm-dd text

CREATE TABLE g1.ext_lineitem
(
    l_orderkey        NUMBER(10, 0),
    l_partkey         NUMBER(10, 0),
    l_suppkey         NUMBER(10, 0),
    l_linenumber      INTEGER,
    l_quantity        NUMBER,
    l_extendedprice   NUMBER,
    l_discount        NUMBER,
    l_tax             NUMBER,
    l_returnflag      CHAR(1),
    l_linestatus      CHAR(1),
    l_shipdate        CHAR(10),
    l_commitdate      CHAR(10),
    l_receiptdate     CHAR(10),
    l_shipinstruct    CHAR(25),
    l_shipmode        CHAR(10),
    l_comment         VARCHAR2(44)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('lineitem.tbl'));

CREATE TABLE g1.lineitem
(
    l_orderkey        NUMBER(10, 0),
    l_partkey         NUMBER(10, 0),
    l_suppkey         NUMBER(10, 0),
    l_linenumber      INTEGER,
    l_quantity        NUMBER,
    l_extendedprice   NUMBER,
    l_discount        NUMBER,
    l_tax             NUMBER,
    l_returnflag      CHAR(1),
    l_linestatus      CHAR(1),
    l_shipdate        DATE,
    l_commitdate      DATE,
    l_receiptdate     DATE,
    l_shipinstruct    CHAR(25),
    l_shipmode        CHAR(10),
    l_comment         VARCHAR2(44)
);

CREATE TABLE g1.ext_nation
(
    n_nationkey   NUMBER(10, 0),
    n_name        CHAR(25),
    n_regionkey   NUMBER(10, 0),
    n_comment     VARCHAR(152)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('nation.tbl'));

CREATE TABLE g1.nation
(
    n_nationkey   NUMBER(10, 0),
    n_name        CHAR(25),
    n_regionkey   NUMBER(10, 0),
    n_comment     VARCHAR(152)
);

CREATE TABLE g1.ext_region
(
    r_regionkey   NUMBER(10, 0),
    r_name        CHAR(25),
    r_comment     VARCHAR(152)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch30_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('region.tbl'));

CREATE TABLE g1.region
(
    r_regionkey   NUMBER(10, 0),
    r_name        CHAR(25),
    r_comment     VARCHAR(152)
);


TRUNCATE TABLE g1.part;
TRUNCATE TABLE g1.supplier;
TRUNCATE TABLE g1.partsupp;
TRUNCATE TABLE g1.customer;
TRUNCATE TABLE g1.orders;
TRUNCATE TABLE g1.lineitem;
TRUNCATE TABLE g1.nation;
TRUNCATE TABLE g1.region;

ALTER SESSION SET nls_date_format='YYYY-MM-DD';

INSERT /*+ APPEND */ INTO  g1.part     SELECT * FROM g1.ext_part;
INSERT /*+ APPEND */ INTO  g1.supplier SELECT * FROM g1.ext_supplier;
INSERT /*+ APPEND */ INTO  g1.partsupp SELECT * FROM g1.ext_partsupp;
INSERT /*+ APPEND */ INTO  g1.customer SELECT * FROM g1.ext_customer;
INSERT /*+ APPEND */ INTO  g1.orders   SELECT * FROM g1.ext_orders;
INSERT /*+ APPEND */ INTO  g1.lineitem SELECT * FROM g1.ext_lineitem;
INSERT /*+ APPEND */ INTO  g1.nation   SELECT * FROM g1.ext_nation;
INSERT /*+ APPEND */ INTO  g1.region   SELECT * FROM g1.ext_region;


--- (만약의 상황) 데이터를 다시 넣기

ALTER TABLE tpch20.lineitem drop CONSTRAINT chk_lineitem_ship_rcpt ----check(l_shipdate <= l_receiptdate);
;
ALTER TABLE tpch20.lineitem
DROP CONSTRAINT chk_lineitem_discount -- --check(l_discount >= 0.00 AND l_discount <= 1.00);
;
ALTER TABLE tpch20.lineitem
    drop CONSTRAINT chk_lineitem_tax ----check(l_tax >= 0);
    ;
ALTER TABLE tpch20.lineitem
    drop CONSTRAINT chk_lineitem_extendedprice ----check(l_extendedprice >= 0);
;
ALTER TABLE tpch20.lineitem
    drop CONSTRAINT chk_lineitem_quantity --check(l_quantity >= 0);
;
ALTER TABLE tpch20.orders
    drop CONSTRAINT chk_orders_totalprice --check(o_totalprice >= 0);
;
ALTER TABLE tpch20.partsupp
    drop CONSTRAINT chk_partsupp_supplycost --check(ps_supplycost >= 0);
;
ALTER TABLE tpch20.partsupp
    drop CONSTRAINT chk_partsupp_availqty --check(ps_availqty >= 0);
;
ALTER TABLE tpch20.part
    drop CONSTRAINT chk_part_retailprice --check(p_retailprice >= 0);
;
    ALTER TABLE tpch20.part
    drop CONSTRAINT chk_part_size --check(p_size >= 0);
;

ALTER TABLE tpch20.nation
    drop CONSTRAINT chk_nation_nationkey --check(n_nationkey >= 0);
 ;   
ALTER TABLE tpch20.region
    drop CONSTRAINT chk_region_regionkey --check(r_regionkey >= 0);
;
ALTER TABLE tpch20.partsupp
    drop CONSTRAINT chk_partsupp_partkey --check(ps_partkey >= 0);
;
ALTER TABLE tpch20.customer
    drop CONSTRAINT chk_customer_custkey --check(c_custkey >= 0);
;
ALTER TABLE tpch20.supplier
    drop CONSTRAINT chk_supplier_suppkey --check(s_suppkey >= 0);
;
    ALTER TABLE tpch20.part
    drop CONSTRAINT chk_part_partkey --check(p_partkey >= 0);
;

 ALTER TABLE tpch20.region
    drop CONSTRAINT pk_region -- --primary key(r_regionkey);
;
 ALTER TABLE tpch20.nation
    drop CONSTRAINT pk_nation --primary key(n_nationkey);
;
ALTER TABLE tpch20.lineitem
    drop CONSTRAINT pk_lineitem --primary key(l_linenumber, l_orderkey);
;
ALTER TABLE tpch20.orders
    drop CONSTRAINT pk_orders --primary key(o_orderkey);
;
ALTER TABLE tpch20.customer
    drop CONSTRAINT pk_customer --primary key(c_custkey);
;
ALTER TABLE tpch20.partsupp
    drop CONSTRAINT pk_partsupp --primary key(ps_partkey, ps_suppkey);
;
ALTER TABLE tpch20.supplier
    drop CONSTRAINT pk_supplier --primary key(s_suppkey);
;
ALTER TABLE tpch20.part
    drop CONSTRAINT pk_part --primary key(p_partkey);
;


-- 1.4.2.3


ALTER TABLE tpch20.lineitem
    drop CONSTRAINT fk_lineitem_partsupp; -- FOREIGN KEY(l_partkey, l_suppkey);
--        REFERENCES tpch20.partsupp(ps_partkey, ps_suppkey);
 ;
ALTER TABLE tpch20.lineitem
    drop CONSTRAINT fk_lineitem_supplier ;--FOREIGN KEY(l_suppkey) REFERENCES tpch20.supplier(s_suppkey);
;
ALTER TABLE tpch20.lineitem
    drop CONSTRAINT fk_lineitem_part ; -- FOREIGN KEY(l_partkey) REFERENCES tpch20.part(p_partkey);
 ;
ALTER TABLE tpch20.lineitem
    drop CONSTRAINT fk_lineitem_order ; --FOREIGN KEY(l_orderkey) REFERENCES tpch20.orders(o_orderkey);
;
ALTER TABLE tpch20.orders
    drop CONSTRAINT fk_orders_customer -- FOREIGN KEY(o_custkey) REFERENCES tpch20.customer(c_custkey);
  ;  
ALTER TABLE tpch20.customer
    drop CONSTRAINT fk_customer_nation -- FOREIGN KEY(c_nationkey) REFERENCES tpch20.nation(n_nationkey);
 ;   
ALTER TABLE tpch20.partsupp
    drop CONSTRAINT fk_partsupp_supplier -- FOREIGN KEY(ps_suppkey) REFERENCES tpch20.supplier(s_suppkey);
;
         ALTER TABLE tpch20.partsupp
    drop CONSTRAINT fk_partsupp_part -- FOREIGN KEY(ps_partkey) REFERENCES tpch20.part(p_partkey);
; 
 
 -----

INSERT /*+ APPEND */ INTO  tpch20.region   SELECT * FROM tpch20.ext_region;
INSERT /*+ APPEND */ INTO  tpch20.customer SELECT * FROM tpch20.ext_customer;


ALTER TABLE tpch20.part
    ADD CONSTRAINT pk_part PRIMARY KEY(p_partkey);

ALTER TABLE tpch20.supplier
    ADD CONSTRAINT pk_supplier PRIMARY KEY(s_suppkey);

ALTER TABLE tpch20.partsupp
    ADD CONSTRAINT pk_partsupp PRIMARY KEY(ps_partkey, ps_suppkey);



ALTER TABLE tpch20.customer
    ADD CONSTRAINT pk_customer PRIMARY KEY(c_custkey);

ALTER TABLE tpch20.orders
    ADD CONSTRAINT pk_orders PRIMARY KEY(o_orderkey);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT pk_lineitem PRIMARY KEY(l_linenumber, l_orderkey);

ALTER TABLE tpch20.nation
    ADD CONSTRAINT pk_nation PRIMARY KEY(n_nationkey);

ALTER TABLE tpch20.region
    ADD CONSTRAINT pk_region PRIMARY KEY(r_regionkey);

 
 ALTER TABLE tpch20.partsupp
    ADD CONSTRAINT fk_partsupp_part FOREIGN KEY(ps_partkey) REFERENCES tpch20.part(p_partkey);

ALTER TABLE tpch20.partsupp
    ADD CONSTRAINT fk_partsupp_supplier FOREIGN KEY(ps_suppkey) REFERENCES tpch20.supplier(s_suppkey);

ALTER TABLE tpch20.customer
    ADD CONSTRAINT fk_customer_nation FOREIGN KEY(c_nationkey) REFERENCES tpch20.nation(n_nationkey);

ALTER TABLE tpch20.orders
    ADD CONSTRAINT fk_orders_customer FOREIGN KEY(o_custkey) REFERENCES tpch20.customer(c_custkey);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT fk_lineitem_order FOREIGN KEY(l_orderkey) REFERENCES tpch20.orders(o_orderkey);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT fk_lineitem_part FOREIGN KEY(l_partkey) REFERENCES tpch20.part(p_partkey);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT fk_lineitem_supplier FOREIGN KEY(l_suppkey) REFERENCES tpch20.supplier(s_suppkey);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT fk_lineitem_partsupp FOREIGN KEY(l_partkey, l_suppkey)
        REFERENCES tpch20.partsupp(ps_partkey, ps_suppkey);

-- 1.4.2.4 - 1

ALTER TABLE tpch20.part
    ADD CONSTRAINT chk_part_partkey CHECK(p_partkey >= 0);

ALTER TABLE tpch20.supplier
    ADD CONSTRAINT chk_supplier_suppkey CHECK(s_suppkey >= 0);

ALTER TABLE tpch20.customer
    ADD CONSTRAINT chk_customer_custkey CHECK(c_custkey >= 0);

ALTER TABLE tpch20.partsupp
    ADD CONSTRAINT chk_partsupp_partkey CHECK(ps_partkey >= 0);

ALTER TABLE tpch20.region
    ADD CONSTRAINT chk_region_regionkey CHECK(r_regionkey >= 0);

ALTER TABLE tpch20.nation
    ADD CONSTRAINT chk_nation_nationkey CHECK(n_nationkey >= 0);

-- 1.4.2.4 - 2

ALTER TABLE tpch20.part
    ADD CONSTRAINT chk_part_size CHECK(p_size >= 0);

ALTER TABLE tpch20.part
    ADD CONSTRAINT chk_part_retailprice CHECK(p_retailprice >= 0);

ALTER TABLE tpch20.partsupp
    ADD CONSTRAINT chk_partsupp_availqty CHECK(ps_availqty >= 0);

ALTER TABLE tpch20.partsupp
    ADD CONSTRAINT chk_partsupp_supplycost CHECK(ps_supplycost >= 0);

ALTER TABLE tpch20.orders
    ADD CONSTRAINT chk_orders_totalprice CHECK(o_totalprice >= 0);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT chk_lineitem_quantity CHECK(l_quantity >= 0);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT chk_lineitem_extendedprice CHECK(l_extendedprice >= 0);

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT chk_lineitem_tax CHECK(l_tax >= 0);

-- 1.4.2.4 - 3

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT chk_lineitem_discount CHECK(l_discount >= 0.00 AND l_discount <= 1.00);

-- 1.4.2.4 - 4

ALTER TABLE tpch20.lineitem
    ADD CONSTRAINT chk_lineitem_ship_rcpt CHECK(l_shipdate <= l_receiptdate);

 

728x90

댓글()

[데이터처리] Spark + Airflow #1

IT_Engineer|2023. 2. 27. 09:43
728x90

[배경] AWS 의 경우 간단한 처리 구조

  • 보관 > 통합 > 처리 구조
    • AWS S3 > AWS Glue > Spark on AWS EMR
    • AWS S3 : 확장성, 데이터 가용성, 보안 및 성능을 제공하는 객체 스토리지 서비스
    • AWS Glue : 분석, 기계 학습(ML) 및 애플리케이션 개발을 위해 여러 소스에서 데이터를 쉽게 탐색, 준비, 이동 및 통합할 수 있도록 하는 확장 가능한 서버리스 데이터 통합 서비스
    • AWS EMR : Apache Spark, Apache Hive 및 Presto와 같은 오픈 소스 프레임워크를 사용하여 페타바이트급 데이터 처리, 대화식 분석 및 기계 학습을 위한 클라우드 빅 데이터 솔루션

Q. Local Spark와 Spark on Kubernetes는 어떻게 다른가요?

더보기

일반적으로 YARN을 통해서 client모드로 Spark job을 실행시켜보면, 
master node에서 driver가 동작하고 worker node에서 executor들이 동작하게 됩니다.
물론 cluster모드라면 driver 또한 worker node어딘가에서 동작할테지요.

Kubernetes에서는 driver가 뜨고 이 driver가 executor pod들을 실행시키게 되는데요. client모드로
Spark job을 실행하면 실행하고자 하는 pod이 driver가 되고 executor pod들이 새로 뜨게 됩니다. 
cluster 모드로 실행하면 실행한 pod과는 별개로 driver pod이 새로 뜨게되며,
새로 뜬 driver pod이 executor pod들을 띄우게 되는 구조입니다.

 

참고1 : https://blog.banksalad.com/tech/spark-on-kubernetes/

 

Spark on Kubernetes로 가자! | 뱅크샐러드

안녕하세요. 저희는 뱅크샐러드 Data Platform 팀 김민수, 김태일 입니다. 이번 글에서는 뱅크샐러드 데이터 분석환경 컴퓨팅을 EMR, YARN 기반 Spark에서 Self-hosted Kubernetes…

blog.banksalad.com

참고2 :  https://techblog.woowahan.com/10291/

 

Spark on Kubernetes로 이관하기 | 우아한형제들 기술블로그

{{item.name}} 안녕하세요, 우아한형제들 데이터플랫폼팀 박준영입니다. 이번 글에서는 Spark on Kubernetes 환경 도입 과정과 운영 경험에 대해 소개해 드리려 합니다. 데이터플랫폼팀은 다양한 서비스

techblog.woowahan.com

 

 

[구성] Spark on Kubernetes 구성

  1. Spark 2.3 이상 binary에는 k8s 배포 관련 util 이 포함되어 있다.
  2. spark app을 Image로 만들어준다.
  3. 만든 Image를 submin-commit 으로 k8s cluster를 지정해서 보낸다.
  4. 그럼 알아서 Driveer 생성 및 Executor 생성

[작업 Script]

1. 바이너리 다운로드 및 빌드하기

  • 도커가 설치된 서버에서 작업하기
# 바이너리 다운로드 (Ozone은 hadoop3 기준이기때문에 v3.3.x 쵸이스 )
$ wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz

# 압축을 풀고
$ tar -zxvf spark-3.3.2-bin-hadoop3.tgz

# 빌드해본다.
$ ./spark-3.3.2-bin-hadoop2.7/bin/docker-image-tool.sh -r {Image주소} -t v1.0 build
Sending build context to Docker daemon  337.7MB
Step 1/18 : ARG java_image_tag=11-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
11-jre-slim: Pulling from library/openjdk

## 생략 ##

Step 18/18 : USER ${spark_uid}
 ---> Running in 23bbcf422aa4
Removing intermediate container 23bbcf422aa4
 ---> 9fcf542b4d4a
Successfully built 9fcf542b4d4a
Successfully tagged {Image주소}/spark:v1.0

# build 확인
$ docker images |grep spark 

# push 해본다.
$ ./spark-3.3.2-bin-hadoop3/bin/docker-image-tool.sh -r {Image주소} -t v1.0 push
The push refers to repository [{Image주소}/spark]
b9ca6c484aa7: Pushed 
f775e56e0fcd: Pushed 
d00b1260ea67: Pushed 
acad6e80a726: Pushed 
481ab0385b18: Pushed 
15a249d53e5c: Pushed 
0fb02ac6bd62: Pushed 
8543b7b8afe5: Pushed 
e501578f6db2: Pushed 
f309a4cd9a46: Pushed 
d7802b8508af: Pushed 
e3abdc2e9252: Pushed 
eafe6e032dbd: Pushed 
92a4e8a3140f: Pushed 
v1.0: digest: sha256:f2bf057b9d3c98036bc0b94dc9221faecdb334d6ad16c64b563f01f31847593e size: 3459
{Image주소}/spark-py:v1.0 image not found. Skipping push for this image.
{Image주소}/spark-r:v1.0 image not found. Skipping push for this image.

# push 확인
$ curl {Image주소}/v2/_catalog?n=1000 |grep spark
$ curl {Image주소}/v2/spark/tags/list
{"name":"spark","tags":["v1.0","base","spark-crystal"]}

 

2. sumit shell 작성해보자

## 마스터에서 클러스터 정보 추출
$ kubectl cluster-info
Kubernetes master is running at https://{k8sIP}:6443
KubeDNS is running at https://{k8sIP}:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://{k8sIP}:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

$ vi ./k8s_submit.sh
#!/bin/bash
IMAGE_REPO={Image주소}
export SPARK_HOME=./spark-3.3.2-bin-hadoop3

$SPARK_HOME/bin/spark-submit \
    --master k8s://https://{k8sIP}:6443 \
    --deploy-mode cluster \
    --name spark-example \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.container.image=$IMAGE_REPO/spark:v1.0 \
    $SPARK_HOME/examples/jars/spark-examples_2.12-3.3.2.jar

## 1차 시도 -> 에러 https://m.blog.naver.com/firstpcb/221762669146
$ ./k8s_submit.sh 
23/02/26 15:01:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
### 생략 ###  >> Hadoop Binary가 필요해 보인다.
$ wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
$ tar -zxvf hadoop-3.3.4.tar.gz
$ cp ./spark-3.3.2-bin-hadoop3/conf/spark-env.sh.template ./spark-3.3.2-bin-hadoop3/conf/spark-env.sh
$ vi ./spark-3.3.2-bin-hadoop3/conf/spark-env.sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/jhhan/spark/hadoop-3.3.4/lib/native # 추가

## 2차 시도 -> 또 에러난다....
$ ./k8s_submit.sh
23/02/26 15:26:43 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/02/26 15:26:43 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property.
    at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:330)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:276)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
    at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:188)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
    at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
    at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2764)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/02/26 15:26:43 INFO ShutdownHookManager: Shutdown hook called
23/02/26 15:26:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-7aa1d18a-8834-42d4-9831-15d80d9779af

## 3차 시도
## $ ./k8s_submit.sh 에서 실행 jar 파일 Paht 앞에 local:// 설정

## 결과
#!/bin/bash
IMAGE_REPO={Image주소}
export SPARK_HOME=/data/jhhan/spark/spark-3.3.2-bin-hadoop3

$SPARK_HOME/bin/spark-submit \
    --master k8s://https://{k8sIP}:6443 \
    --deploy-mode cluster \
    --name spark-example \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.container.image=$IMAGE_REPO/spark:v1.0 \
    local://$SPARK_HOME/examples/jars/spark-examples_2.12-3.3.2.jar

## Drive가 배포는 되었으나 Error로 끝이난다.
$ ./k8s_submit.sh 
23/02/26 15:46:38 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/02/26 15:46:38 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
23/02/26 15:46:39 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/data/jhhan/spark/spark-3.3.2-bin-hadoop3/conf) : spark-env.sh
23/02/26 15:46:39 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-example-e4f249868c793199-driver
     namespace: default
     labels: spark-app-name -> spark-example, spark-app-selector -> spark-74dc1638e90149b893ef42bd776f3668, spark-role -> driver, spark-version -> 3.3.2
     pod uid: f82c4e16-3aea-4a48-8220-1bfe06e47d0b
     creation time: 2023-02-26T06:46:39Z
     service account name: default
     volumes: spark-local-dir-1, spark-conf-volume-driver, default-token-2dxpc
     node name: N/A
     start time: N/A
     phase: Pending
     container status: N/A
23/02/26 15:46:39 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-example-e4f249868c793199-driver
     namespace: default
     labels: spark-app-name -> spark-example, spark-app-selector -> spark-74dc1638e90149b893ef42bd776f3668, spark-role -> driver, spark-version -> 3.3.2
     pod uid: f82c4e16-3aea-4a48-8220-1bfe06e47d0b
     creation time: 2023-02-26T06:46:39Z
     service account name: default
     volumes: spark-local-dir-1, spark-conf-volume-driver, default-token-2dxpc
     node name: N/A
     start time: N/A
     phase: Pending
     container status: N/A
23/02/26 15:46:39 INFO LoggingPodStatusWatcherImpl: Waiting for application spark-example with submission ID default:spark-example-e4f249868c793199-driver to finish...
23/02/26 15:46:39 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-example-e4f249868c793199-driver
     namespace: default
     labels: spark-app-name -> spark-example, spark-app-selector -> spark-74dc1638e90149b893ef42bd776f3668, spark-role -> driver, spark-version -> 3.3.2
     pod uid: f82c4e16-3aea-4a48-8220-1bfe06e47d0b
     creation time: 2023-02-26T06:46:39Z
     service account name: default
     volumes: spark-local-dir-1, spark-conf-volume-driver, default-token-2dxpc
     node name: k8s-19-worker09
     start time: N/A
     phase: Pending
     container status: N/A
23/02/26 15:46:39 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-example-e4f249868c793199-driver
     namespace: default
     labels: spark-app-name -> spark-example, spark-app-selector -> spark-74dc1638e90149b893ef42bd776f3668, spark-role -> driver, spark-version -> 3.3.2
     pod uid: f82c4e16-3aea-4a48-8220-1bfe06e47d0b
     creation time: 2023-02-26T06:46:39Z
     service account name: default
     volumes: spark-local-dir-1, spark-conf-volume-driver, default-token-2dxpc
     node name: k8s-19-worker09
     start time: 2023-02-26T06:46:40Z
     phase: Pending
     container status: 
         container name: spark-kubernetes-driver
         container image: {Image주소}/spark:v1.0
         container state: waiting
         pending reason: ContainerCreating
23/02/26 15:46:40 INFO LoggingPodStatusWatcherImpl: Application status for spark-74dc1638e90149b893ef42bd776f3668 (phase: Pending)
23/02/26 15:46:41 INFO LoggingPodStatusWatcherImpl: Application status for spark-74dc1638e90149b893ef42bd776f3668 (phase: Pending)
23/02/26 15:46:41 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-example-e4f249868c793199-driver
     namespace: default
     labels: spark-app-name -> spark-example, spark-app-selector -> spark-74dc1638e90149b893ef42bd776f3668, spark-role -> driver, spark-version -> 3.3.2
     pod uid: f82c4e16-3aea-4a48-8220-1bfe06e47d0b
     creation time: 2023-02-26T06:46:39Z
     service account name: default
     volumes: spark-local-dir-1, spark-conf-volume-driver, default-token-2dxpc
     node name: k8s-19-worker09
     start time: 2023-02-26T06:46:40Z
     phase: Pending
     container status: 
         container name: spark-kubernetes-driver
         container image: {Image주소}/spark:v1.0
         container state: waiting
         pending reason: ContainerCreating
23/02/26 15:46:42 INFO LoggingPodStatusWatcherImpl: Application status for spark-74dc1638e90149b893ef42bd776f3668 (phase: Pending)
23/02/26 15:46:43 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-example-e4f249868c793199-driver
     namespace: default
     labels: spark-app-name -> spark-example, spark-app-selector -> spark-74dc1638e90149b893ef42bd776f3668, spark-role -> driver, spark-version -> 3.3.2
     pod uid: f82c4e16-3aea-4a48-8220-1bfe06e47d0b
     creation time: 2023-02-26T06:46:39Z
     service account name: default
     volumes: spark-local-dir-1, spark-conf-volume-driver, default-token-2dxpc
     node name: k8s-19-worker09
     start time: 2023-02-26T06:46:40Z
     phase: Running
     container status: 
         container name: spark-kubernetes-driver
         container image: {Image주소}/spark:v1.0
         container state: running
         container started at: 2023-02-26T06:46:42Z
23/02/26 15:46:43 INFO LoggingPodStatusWatcherImpl: Application status for spark-74dc1638e90149b893ef42bd776f3668 (phase: Running)
23/02/26 15:46:44 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
     pod name: spark-example-e4f249868c793199-driver
     namespace: default
     labels: spark-app-name -> spark-example, spark-app-selector -> spark-74dc1638e90149b893ef42bd776f3668, spark-role -> driver, spark-version -> 3.3.2
     pod uid: f82c4e16-3aea-4a48-8220-1bfe06e47d0b
     creation time: 2023-02-26T06:46:39Z
     service account name: default
     volumes: spark-local-dir-1, spark-conf-volume-driver, default-token-2dxpc
     node name: k8s-19-worker09
     start time: 2023-02-26T06:46:40Z
     phase: Failed
     container status: 
         container name: spark-kubernetes-driver
         container image: {Image주소}/spark:v1.0
         container state: terminated
         container started at: 2023-02-26T06:46:42Z
         container finished at: 2023-02-26T06:46:44Z
         exit code: 101
         termination reason: Error
23/02/26 15:46:44 INFO LoggingPodStatusWatcherImpl: Application status for spark-74dc1638e90149b893ef42bd776f3668 (phase: Failed)
23/02/26 15:46:44 INFO LoggingPodStatusWatcherImpl: Container final statuses:
     container name: spark-kubernetes-driver
     container image: {Image주소}/spark:v1.0
     container state: terminated
     container started at: 2023-02-26T06:46:42Z
     container finished at: 2023-02-26T06:46:44Z
     exit code: 101
     termination reason: Error
23/02/26 15:46:44 INFO LoggingPodStatusWatcherImpl: Application spark-example with submission ID default:spark-example-e4f249868c793199-driver finished
23/02/26 15:46:44 INFO ShutdownHookManager: Shutdown hook called
23/02/26 15:46:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-86286d2a-0416-43ff-990a-6b1bcf5ccebc

### Log를 찍어보면 아래와 같다.
$ kubectl logs spark-example-e4f249868c793199-driver
## 생략 ##
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.10.192.30 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi local:///data/jhhan/spark/spark-3.3.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.3.2.jar
23/02/26 06:46:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/02/26 06:46:44 WARN DependencyUtils: Local jar /data/jhhan/spark/spark-3.3.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.3.2.jar does not exist, skipping.
23/02/26 06:46:44 WARN DependencyUtils: Local jar /data/jhhan/spark/spark-3.3.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.3.2.jar does not exist, skipping.
Error: Failed to load class org.apache.spark.examples.SparkPi.
23/02/26 06:46:44 INFO ShutdownHookManager: Shutdown hook called
23/02/26 06:46:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-39d25f0b-a40d-4156-98d3-a734916afd9d

## 아마도 Log 내용으로 보아 설정한 local이 sumit을 실행한 곳이나 Pod가 배포된 k8s Worker 서버가 아닌 Pod 내부를 말하는것 같다.
## https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1473
## 즉 수행할 이미지를 만들어 쓰거나 수행할 app jar를 모아놓는 pvc를 만들어서 거글 통해서 하면 편리할듯 하다.

## 4차 시도
$ vi k8s_submit.sh
#!/bin/bash
IMAGE_REPO={Image주소}
export SPARK_HOME=/data/jhhan/spark/spark-3.3.2-bin-hadoop3

$SPARK_HOME/bin/spark-submit \
    --master k8s://https://{k8sIP}:6443 \
    --deploy-mode cluster \
    --name spark-example \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.container.image=$IMAGE_REPO/spark:v1.0 \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar ## 경로 변경 (위에서 Build시에 쓰는 Dockerfile을 열어보니 존재할듯 하다)

## cat /home/data/hanjh/spark/spark-3.3.2-bin-hadoop3/kubernetes/dockerfiles/spark/Dockerfile

$ ./k8s_submit.sh
## Driver는 떠있지만 Log를 보니 Service 접근이 안되는듯 하다.
$ kubectl logs pod/spark-example-086b82868cb4ec97-driver

23/02/26 07:51:58 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3002)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:573)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2714)
    at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
    at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
    at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.base/java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:158)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:113)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2996)
    ... 19 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-example-086b82868cb4ec97-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-example-086b82868cb4ec97-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:487)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:457)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:698)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:184)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:151)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:83)
    at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:79)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:78)
    ... 26 more
23/02/26 07:51:58 INFO SparkUI: Stopped Spark web UI at http://spark-example-086b82868cb4ec97-driver-svc.default.svc:4040
23/02/26 07:51:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!'

## 5차 시도
# https://stackoverflow.com/questions/55498702/how-to-fix-forbiddenconfigured-service-account-doesnt-have-access-with-spark
# spark 가 k8s를 사용하기 위한 권한을 부여해야한다.
# 아래와 같이 생성후 사용하도록 conf를 지정한다.
$ vi spark_base.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-cluster-role
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list", "create", "delete"]
- apiGroups: [""] # "" indicates the core API group
  resources: ["services"]
  verbs: ["get", "create", "delete"]
- apiGroups: [""] # "" indicates the core API group
  resources: ["configmaps"]
  verbs: ["get", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-cluster-role-binding
subjects:
- kind: ServiceAccount
  name: spark
  namespace: default
roleRef:
  kind: ClusterRole
  name: spark-cluster-role
  apiGroup: rbac.authorization.k8s.io

$ kubectl apply -f spark_base.yaml

$ vi k8s_submit.sh
## 아래 컨피그 추가
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark

$ ./k8s_submit.sh
## 성공!!!!!!

## 수행된후 확인하면 이렇게 driver의 상태가 completed가 된다.
NAME                                        READY   STATUS      RESTARTS   AGE     IP             NODE              NOMINATED NODE   READINESS GATES
pod/pod-1                                   1/1     Running     0          3d15h   10.10.220.66   k8s-19-master02   <none>           <none>
pod/spark-example-2a262386904c5997-driver   0/1     Completed   0          42s     10.10.192.33   k8s-19-worker09   <none>           <none>

NAME                                                TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                      AGE    SELECTOR
service/kubernetes                                 ClusterIP   10.244.0.1      <none>        443/TCP                      332d   <none>
service/spark-example-2a262386904c5997-driver-svc   ClusterIP   None          <none>        7078/TCP,7079/TCP,4040/TCP   42s    spark-app-name=spar
k-example,spark-app-selector=spark-0f7944e3ad1f40758cdfcac6c1d9b2d9,spark-role=driver,spark-version=3.3.2
service/svc-1                                       NodePort    10.244.117.41   <none>        9000:31788/TCP               5d9h   app=pod
728x90

댓글()

[OS] 개발 편의성을 위한 Windows 환경 세팅하기 - chocolately

IT_Engineer/Unix & Linux (Bash)|2022. 11. 15. 12:20
728x90

물론 개발을 위한 환경으로 Linux 환경이라면 정말 편하겠지만,,, 

업무 편의성을 위해 Windows OS가 설치되어있는 PC에 환경 세팅이 필요한 경우가 있다.

혹은 개발용 PC가 아니지만, 간단한 데이터 작업 혹은 IT Tech 관련 업무가 필요할 때 !!

 

1.Windows용 패키지 관리자 Chocolatey(choco) 란?

< Chocolately (AKA. Choco)

 - Chocolately 는 Windows의 패키지 매니저 : 필요한 설치 프로그램(Pakage)를 설치/업데이트/제거 등 관리

 - Likes, Linux의 yum/apt(-get) 또는 Mac의 Homevrew

- 공식 설치 홈페이지 : https://chocolatey.org/

 

Chocolatey - The package manager for Windows

Chocolatey is software management automation for Windows that wraps installers, executables, zips, and scripts into compiled packages. Chocolatey integrates w/SCCM, Puppet, Chef, etc. Chocolatey is trusted by businesses to manage software deployments.

chocolatey.org

 

2. 설치 방법

1) Copy Command for Downloading

- 홈페이지 > 우측 상단의 Try It Now > Step 2의 Now run the following command 하단의 명령어 복사(클립보드 아이콘)

2) Running on PowerShell

 - 윈도우 검색 혹은 탐색기에서 Shift+마우스 우클릭 > PowerShell - 관리자(권한으)로 실행 > 복사한 명령어 실행

 - 복사한 명령어 그대로 붙여넣고, 엔터!!!!!  > PowerShell 창 닫기

3) CHOCOLATETLY 사용하기

3.1) Chocolately HOME > Find Package : 필요한 Package를 찾아보자! 

3.2) 약관 및 Disclaimer 동의 > I Understand

3.3) 통합 검색창에 필요한 Package 이름을 적고 > 설치할 때 쓰는 명령어 복사 후 그대로 PowerShell 창에 붙여넣으면 끝!

 

728x90

댓글()

[Kube] Kubeflow, WHO ARE YOU?

IT_Engineer/클라우드|2022. 11. 2. 16:46
728x90

 

1. 개요 : KUBEFLOW란?

  KUBEFLOW  = "Kube"rnetes + ML "FLOW" 의 합성어
  즉, 쿠버네티스 기반의 Machine Learning FLOW 플랫폼 (혹은 AI 플랫폼)

 

 

🔵 쿠버네티스 기반에서 머신러닝(Machine Learning)

워크플로우(Workflow)를 자동화, ML 워크플로우 각 단계에서

머신러닝 모델을 개발하는데 필요한 도구와 환경 제공

→ ML 모델 쉽게 개발 및 배포, 확장 및 이식이 용이

 

※  머신러닝 워크플로우: 머신러닝 모델을 개발하는 모든 과정
   (데이터 전처리, 탐색적 데이터 분석, 데이터 변환, 하이퍼 파라미터 튜닝, 머신러닝 모델 학습, 머신러닝 모델 배포단계)

 

2. 아키텍처 및 구조

 

 

728x90

댓글()

[Sybase IQ] 데모 설치하기

IT_Engineer/Data Base|2022. 3. 30. 18:06
728x90

SYBASE IQ 설치

  • Sybase IQ를 설치하고 활용하다보니 너무 가이드도 없고, 커뮤니티도 적어서 누군가에게 도움이 되길 바라며 기록을 한다.

0. 환경

  • OS (Linux)
    [root@hd8 ~]# uname -a Linux hd8 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 x86_64 x86_64 86_64 GNU/Linux`
  • Binary
  • SAP 홈페이지(https://www.sap.com/korea/products/sybase-iq-big-data-management.html)
  • Request a demo (Software Free Trials) 클릭 후 신청양식 제출을 통해 demo binary 다운
  • 본 가이드에서는 "sybase_bin64.tar" 파일 사용
  • tar 풀기

1. 설치하기

[root@g1 ~\]# cd /data/sybase/SYBASE\_HOME/

[root@g1 SYBASE\_HOME\]# ls -al


합계 24  
drwxr-xr-x 17 root root 4096 2월 16 14:10 .  
drwxrwxr-x 4 sybase root 91 2월 16 14:02 ..  
drwxrwxr-x 14 root root 170 2월 16 14:11 COCKPIT-4  
drwxrwxr-x 17 root root 262 2월 16 14:11 IQ-16\_1  
lrwxrwxrwx 1 root root 21 2월 16 14:05 IQ.csh -> ./IQ-16\_1/IQ-16\_1.csh  
lrwxrwxrwx 1 root root 20 2월 16 14:05 IQ.sh -> ./IQ-16\_1/IQ-16\_1.sh  
drwxrwxr-x 15 root root 197 2월 16 14:06 OCS-16\_0  
\-rwxr-xr-x 1 root root 1172 2월 16 14:10 SYBASE.csh  
\-rw-r--r-- 1 root root 764 2월 16 14:10 SYBASE.env  
\-rwxr-xr-x 1 root root 1010 2월 16 14:10 SYBASE.sh  
drwxrwxr-x 6 root root 59 2월 16 14:09 SYSAM-2\_0  
drwxr-xr-x 2 root root 24 2월 16 14:04 Sybase\_Install\_Registry  
drwxrwxr-x 59 root root 4096 2월 16 14:05 charsets  
drwxrwxr-x 3 root root 21 2월 16 14:05 collate  
drwxrwxr-x 2 root root 65 2월 16 14:06 config  
drwxrwxr-x 9 root root 162 2월 16 14:09 jConnect-16\_0  
drwxr-xr-x 4 27940 sybase 146 8월 12 2015 jre  
drwxrwxr-x 3 root root 18 2월 16 14:09 jutils-3\_0  
drwxrwxr-x 4 root root 55 2월 16 14:06 locales  
drwxr-xr-x 2 root root 4096 2월 16 14:14 log  
drwxrwxr-x 4 root root 62 2월 16 14:04 shared  
drwxr-xr-x 4 root root 33 2월 16 14:04 sybuninstall

\[root@g1 SYBASE\_HOME\]# cd ./IQ-16\_1/  
\[root@g1 IQ-16\_1\]# ls -al  
합계 36  
drwxrwxr-x 17 root root 262 2월 16 14:11 .  
drwxr-xr-x 17 root root 4096 2월 16 14:10 ..  
drwxrwxr-x 3 root root 26 2월 16 14:08 DBACOCKPIT  
\-rwxr-xr-x 1 root root 1617 2월 16 14:05 IQ-16\_1.csh  
\-rwxr-xr-x 1 root root 1649 2월 16 14:05 IQ-16\_1.sh  
drwxrwxr-x 2 root root 28 2월 16 14:04 Sysam  
drwxrwxr-x 4 root root 4096 2월 17 17:29 bin64  
drwxrwxr-x 4 root root 135 2월 16 14:08 demo  
drwxr-xr-x 2 root root 34 2월 16 14:11 install  
drwxrwxr-x 2 root root 4096 2월 16 14:08 java  
drwxrwxr-x 2 root root 4096 2월 16 14:08 lib64  
drwxrwxrwx 2 root root 4096 2월 21 14:50 logfiles  
drwxrwxr-x 2 root root 156 2월 16 14:08 lsunload  
drwxrwxr-x 2 root root 4096 2월 16 14:08 res  
drwxrwxr-x 5 root root 56 2월 16 14:08 samples  
drwxrwxr-x 2 root root 115 2월 16 14:08 scripts  
drwxrwxr-x 11 root root 122 2월 16 14:08 sdk  
lrwxrwxrwx 1 root root 9 2월 16 14:08 shared -> ../shared  
drwxrwxr-x 2 root root 25 2월 16 14:08 tix  
drwxrwxr-x 2 root root 4096 2월 16 14:08 unload

SYBASE 경로 변수 export

[root@g1 bin64\]# export IQDIR16='/data/sybase/SYBASE\_HOME/IQ-16\_1/'  
[root@g1 bin64\]# export SYBASE='/data/sybase/SYBASE\_HOME/'

# DEMO 구축 shell 파일 - dbisqlc, stop\_iq, start\_iq, dbstop 경로 지정
[root@g1 bin64\]# vi ./mkiqdemo.sh

# IQ DEMO 구축
# mkiqdemo.sh 쉘파일 실행 -dba 계정(User ID) -pwd 비밀번호(6글자 이상)
[root@g1 bin64\]# sh ./mkiqdemo.sh -dba DBA -pwd sybase

Warning: You are running this script as 'root'.  
Sybase strongly recommend that servers not be run with root privileges.

This script will create the iqdemo database in the current  
directory. An existing iqdemo database in this directory  
will need to be shutdown and over-written.

# Y로 진행 (대문자 Y로 답할 것!)
Continue <Y/N>? Y

Checking system ...
The following 1 server(s) are owned by 'root'

## Owner PID Started CPU Time Additional Information

---

1: root 31692 14:58 00:00:02 SVR:bld\_demo\_9000 DB:none PORT:9000  
/data/sybase/SYBASE\_HOME/IQ-16\_1/bin64/iqsrv16 @bld\_demo\_9000.cfg -gc 20 -gl all -ti 4400 -gn 25 -o /data/sybase/SYBASE\_HOME/IQ-16\_1/logfiles/bld\_dem  
\--

Please note that 'stop_iq' will shut down a server completely
without regard for users, connections, or load process status.
For more control, use the 'dbstop' utility, which has options
that control stopping servers based on active connections.


Do you want to stop the server displayed above <Y/N>? Y

Shutting down server (31692) ...  
Checkpointing server (31692) ...  
Server shutdown.

\-rw-r--r-- 1 root root 546 2월 21 14:59 iqdemo.cfg  
\-r--r--r-- 1 root root 4505600 2월 21 14:59 iqdemo.db  
\-rw-r--r-- 1 root root 104857600 2월 21 14:59 iqdemo.iq  
\-rw-r--r-- 1 root root 41542 2월 21 14:59 iqdemo.iqmsg  
\-rw-r--r-- 1 root root 26214400 2월 21 14:59 iqdemo.iqtmp  
\-rw-r--r-- 1 root root 469 2월 21 14:59 iqdemo.lmp  
\-r-------- 1 root root 327680 2월 21 14:59 iqdemo.log  
\-rw-r--r-- 1 root root 104857600 2월 21 14:59 iqdemo\_main.iq

The iqdemo database already exists. Overwrite <Y/N>? Y

Removing current iqdemo database ...  
Removed.

Starting Server ... bld\_demo\_9332

Starting server bld\_demo\_9332 on g1 at port 9332 (02/21 15:01:01)

Run Directory : /data/sybase/SYBASE\_HOME/IQ-16\_1/bin64  
Server Executable : /data/sybase/SYBASE\_HOME/IQ-16\_1/bin64/iqsrv16  
Server Output Log : /data/sybase/SYBASE\_HOME/IQ-16\_1/logfiles/bld\_demo\_9332.0001.srvlog  
Server Version : 16.1.030.1031/sp03  
Open Client Version : N/A  
User Parameters : '@bld\_demo\_9332.cfg'  
Default Parameters : -gc 20 -gl all -ti 4400 -gn 25

I. 02/21 15:01:02.757645 SAP IQ  
I. 02/21 15:01:02.757719 Version 16.1  
I. 02/21 15:01:02.757744 (64bit mode)  
I. 02/21 15:01:02.757776 Copyright 1992-2017 by SAP AG or an SAP affiliate company. All rights reserved  
I. 02/21 15:01:02.757795 Copyright (c) 2018 SAP SE or an SAP affiliate company.  
I. 02/21 15:01:02.757811 All rights reserved.  
I. 02/21 15:01:02.757833 Use of this software is governed by the SAP Software Use Rights Agreement.  
I. 02/21 15:01:02.757854 Refer to [http://www.sap.com/about/agreements.html](http://www.sap.com/about/agreements.html).  
I. 02/21 15:01:02.757869  
I. 02/21 15:01:02.757899 Processors detected: 8 logical processor(s) on 4 core(s) on 1 physical processor(s)  
I. 02/21 15:01:02.757941 This server is licensed to use: all logical processors in the system  
I. 02/21 15:01:02.757964 Processors in use by server: 8 logical processor(s) on 4 core(s) on 1 physical processor(s)  
I. 02/21 15:01:02.757999 Running Linux 3.10.0-1160.11.1.el7.x86\_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 on X86\_64  
I. 02/21 15:01:02.758040 Server built for X86\_64 processor architecture  
I. 02/21 15:01:02.761191 49720K of memory used for caching  
I. 02/21 15:01:02.761237 Minimum cache size: 49728K, maximum cache size: 262144K  
I. 02/21 15:01:02.761276 Using a maximum page size of 4096 bytes  
I. 02/21 15:01:02.763989 Multiprogramming level: 25  
I. 02/21 15:01:02.764016 Automatic tuning of multiprogramming level is disabled

\=============================================================  
IQ server starting with:  
10 connections ( -gm )  
26 cmd resources ( -iqgovern )  
460 threads ( -iqmt )  
512 Kb thread stack size ( -iqtss )  
235520 Kb thread memory size ( -iqmt \* -iqtss )  
8 IQ number of cpus ( -iqnumbercpus )  
0 MB maximum size of IQMSG file ( -iqmsgsz )  
0 copies of IQMSG file archives ( -iqmsgnum )  
64 MB maximum size of main buffer cache ( -iqmc )  
64 MB maximum size of temp buffer cache ( -iqtc )  
2048 MB maximum size of large memory pool ( -iqlm )  
0 MB maximum size of heap memory ( -iqmem )  
2048 MB maximum size of RLV memory ( -iqrlvmem )  
\=============================================================

I. 02/21 15:01:02.785503 Database server started at Mon Feb 21 2022 15:01  
I. 02/21 15:01:02.786100 Trying to start SharedMemory link ...  
I. 02/21 15:01:02.786343 SharedMemory link started successfully  
I. 02/21 15:01:02.786392 Trying to start TCPIP link ...  
I. 02/21 15:01:02.787082 Starting on port 9332  
I. 02/21 15:01:07.787128 TCPIP link started successfully  
I. 02/21 15:01:07.787546 Now accepting requests  
New process id is 526

Server started successfully

Started.  
Connecting to server via TCPIP ...  
Using: uid=DBA;pwd=sql123;eng=bld\_demo\_9332;dbn=utility\_db;links=tcpip{host=g1;port=9332}

Creating database ...  
Demo database created.

Shutting down server ...  
SQL Anywhere Stop Server Utility Version 17.0.9.1031

\-rw-r--r-- 1 root root 546 2월 21 15:01 iqdemo.cfg  
\-r--r--r-- 1 root root 4505600 2월 21 15:01 iqdemo.db  
\-rw-r--r-- 1 root root 104857600 2월 21 15:01 iqdemo.iq  
\-rw-r--r-- 1 root root 41538 2월 21 15:01 iqdemo.iqmsg  
\-rw-r--r-- 1 root root 26214400 2월 21 15:01 iqdemo.iqtmp  
\-rw-r--r-- 1 root root 469 2월 21 15:01 iqdemo.lmp  
\-r-------- 1 root root 327680 2월 21 15:01 iqdemo.log  
\-rw-r--r-- 1 root root 104857600 2월 21 15:01 iqdemo\_main.iq

# DB 생성 성공!!
Demo database created successfully

# iqdemo.cfg 파일을 통해서 서버 생성
\[root@g1 bin64\]# ./start\_iq @iqdemo.cfg iqdemo.db

Starting server g1\_iqdemo on g1 at port 2638 (02/25 22:23:06)

Run Directory : /home/sybase/SAP/IQ-16\_1/bin64  
Server Executable : /home/sybase/SAP/IQ-16\_1/bin64/iqsrv16  
Server Output Log : /home/sybase/SAP/IQ-16\_1/logfiles/g1\_iqdemo.0001.srvlog  
Server Version : 16.1.030.1031/sp03  
Open Client Version : N/A  
User Parameters : '@iqdemo.cfg' 'iqdemo.db'  
Default Parameters : -ti 4400 -gn 25

I. 02/25 22:23:08.440692 SAP IQ  
I. 02/25 22:23:08.440752 Version 16.1  
I. 02/25 22:23:08.440777 (64bit mode)  
I. 02/25 22:23:08.440806 Copyright 1992-2017 by SAP AG or an SAP affiliate company. All rights reserved  
I. 02/25 22:23:08.440836 Copyright (c) 2018 SAP SE or an SAP affiliate company.  
I. 02/25 22:23:08.440857 All rights reserved.  
I. 02/25 22:23:08.440880 Use of this software is governed by the SAP Software Use Rights Agreement.  
I. 02/25 22:23:08.440904 Refer to [http://www.sap.com/about/agreements.html](http://www.sap.com/about/agreements.html).  
I. 02/25 22:23:08.440925  
I. 02/25 22:23:08.443179 Processors detected: 8 logical processor(s) on 4 core(s) on 1 physical processor(s)  
I. 02/25 22:23:08.443661 This server is licensed to use: all logical processors in the system  
I. 02/25 22:23:08.443693 Processors in use by server: 8 logical processor(s) on 4 core(s) on 1 physical processor(s)  
I. 02/25 22:23:08.444032 Running Linux 3.10.0-1160.11.1.el7.x86\_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 on X86\_64  
I. 02/25 22:23:08.444070 Server built for X86\_64 processor architecture  
I. 02/25 22:23:08.539513 49720K of memory used for caching  
I. 02/25 22:23:08.539553 Minimum cache size: 49728K, maximum cache size: 262144K  
I. 02/25 22:23:08.539575 Using a maximum page size of 4096 bytes  
I. 02/25 22:23:08.542228 Multiprogramming level: 25  
I. 02/25 22:23:08.542263 Automatic tuning of multiprogramming level is disabled

\=============================================================  
IQ server starting with:  
10 connections ( -gm )  
26 cmd resources ( -iqgovern )  
460 threads ( -iqmt )  
512 Kb thread stack size ( -iqtss )  
235520 Kb thread memory size ( -iqmt \* -iqtss )  
8 IQ number of cpus ( -iqnumbercpus )  
0 MB maximum size of IQMSG file ( -iqmsgsz )  
0 copies of IQMSG file archives ( -iqmsgnum )  
64 MB maximum size of main buffer cache ( -iqmc )  
64 MB maximum size of temp buffer cache ( -iqtc )  
2048 MB maximum size of large memory pool ( -iqlm )  
0 MB maximum size of heap memory ( -iqmem )  
2048 MB maximum size of RLV memory ( -iqrlvmem )  
\=============================================================

I. 02/25 22:23:08.816707 Starting database "iqdemo" (/home/sybase/SAP/IQ-16\_1/bin64/iqdemo.db) at Fri Feb 25 2022 22:23  
I. 02/25 22:23:08.828570 Database recovery in progress  
I. 02/25 22:23:08.828598 Last checkpoint at Tue Feb 22 2022 11:35  
I. 02/25 22:23:08.828643 Checkpoint log...  
I. 02/25 22:23:09.223211 Transaction log: iqdemo.log...  
I. 02/25 22:23:09.299830 Rollback log...  
I. 02/25 22:23:09.327415 Checkpointing...  
I. 02/25 22:23:09.327483 Starting checkpoint of "iqdemo" (iqdemo.db) at Fri Feb 25 2022 22:23  
I. 02/25 22:23:09.482760 Finished checkpoint of "iqdemo" (iqdemo.db) at Fri Feb 25 2022 22:23  
I. 02/25 22:23:09.482800 Recovery complete  
I. 02/25 22:23:09.706122 Update previous log GUID to: 8d62f71a-9372-11ec-8000-ff2b62b1fccc  
Update current log GUID to :f3142ad8-96d4-11ec-8000-c69e869a5e19  
I. 02/25 22:23:09.807556 Database "iqdemo" (iqdemo.db) started at Fri Feb 25 2022 22:23  
I. 02/25 22:23:09.846863 IQ Server g1\_iqdemo.  
I. 02/25 22:23:09.847295 Starting checkpoint of "iqdemo" (iqdemo.db) at Fri Feb 25 2022 22:23  
I. 02/25 22:23:10.024374 Finished checkpoint of "iqdemo" (iqdemo.db) at Fri Feb 25 2022 22:23  
I. 02/25 22:23:10.032852 Database server started at Fri Feb 25 2022 22:23  
I. 02/25 22:23:10.032967 Trying to start SharedMemory link ...  
I. 02/25 22:23:10.033315 SharedMemory link started successfully  
I. 02/25 22:23:10.033364 Trying to start TCPIP link ...  
I. 02/25 22:23:10.033992 Starting on port 2638  
I. 02/25 22:23:15.034120 TCPIP link started successfully  
I. 02/25 22:23:15.056954 Now accepting requests  
New process id is 19425

Server started successfully
728x90

댓글()

[Tibero] java.sql.SQLException: JDBC-5072, 티베로 버그

IT_Engineer/Data Base|2022. 3. 1. 18:33
728x90

DROP TABLE로 테스트 용 테이블들을 정리하는 중

 

한 테이블을 DROP 하려고 하니, 아래와 같은 에러가 발생했다.

 

[18:14:20.215]java.sql.SQLException: JDBC-5072:Failure converting NUMBER to or from a native type.

 

뭐지???? 처음보는 에러에 살짝 당황스러워서 검색해보니, 티베로 버그인 듯 하다.

ROW COUNT가 NULL이 되면서 생긴 에러 같은데,
해당 계정의 테이블에 대해서만 ROW COUNT가 NULL인 OBJECT ID를 조회하니 나는 안 되더라,,
일단 해당 전체 테이블 대상으로 OBJECT ID가 안 나오길래 전체 테이블에서 에러가 되는 요인을 찾기로 했다.

  1. TIBERO 접속
    - 리눅스 : tbsql sys/tibero
    - TB STUDIO / TBADMIN 등 유틸 사용하는 경우 : DBA계정으로 접속

   2. ROW COUNT가 NULL인 OBJECT ID 검색

select a.object_id from dba_objects a, (select obj_id, row_cnt from sys._dd_tbl ) b where a.object_id= b.obj_id and b.row_cnt is null;

   3. 해당되는 OBJECT_ID들 UPDATE

update sys._dd_tbl set row_cnt = 0 where obj_id in ( select a.object_id from dba_objects a, (select obj_id, row_cnt from sys._dd_tbl ) b where a.object_id= b.obj_id and b.row_cnt is null);

   4. 다시 한 번 에러 났던 테이블에 대해서

DROP TABLE {테이블 이름};

 

728x90

댓글()

[Docker] Docker Hub에 올린 이미지 삭제

IT_Engineer/클라우드|2022. 2. 22. 18:13
728x90

docke hub에 repository를 생성한 후에, Repository를 삭제하는 법을 알아보자.

어느 날 docker search로 뭘 검색하다보니 예전에 만들어두었던 image 가 나오길래

부랴부랴 지우기 위해 docker hub에 로그인했다.

 

로그인하면. 아래와 같이 내 계정에 만들어진 repository가 나오는데,

1. 해당 Repository명을 클릭

 

2. 상단 탭 중에 Settings를 클릭

 

3. 맨 밑에 Delete repository 클릭

 

마지막으로 repository이름을 그대로 복붙해서, 작성한 후 Delete 버튼을 클릭

 

 

하면 깨끗하게 지워진다! 역시 안 쓰는 건 깨끗하게 지워놔야지,,!

728x90

댓글()

[NiFi] Apache Nifi 개념 #1

IT_Engineer/BigData|2022. 2. 3. 12:08
728x90

Ⅰ. Intro

1. NiFi 란?

- Apache NiFi는 " 소프트웨어 시스템 데이터 흐름을 자동화 " 하도록 설계된 ApacheSoftWare Project

- 과거 NSA(National Security Agency)에서 개발하여, Apache에 기증한 Dataflow 엔진

  * 2014 기술 전송 프로그램의 일부로서, 오픈소스화 나이아가라 파일(NiagaraFiles) 기반을 두고 있음

 

- 데이터를 가져오고 이를 처리후 적재하기 위한 ETL(Extract-Transformation-Load) Tools의 일종

- "분산환경에서 대량의 데이터를 수집, 처리" 하며 FBP 개념을 구현하여 만든 오픈소스이다.

  * FBP(Flow Based Programming)는 사전에 Data Flow 를 정의 , 유지하면서 데이터를 교환하는 프로그래밍 패러다임

 

- 실시간 데이터 처리에 유리하지만, 복잡한 연산이 어려워 Spark 또는 Storm 연동하여 많이 사용

 

1.1 NiFi 개념

- Flow File, Connection, Processor 크게 가지로 구성 + 저장소, Flow Controller 도 추가적으로 알아두기

 

1) FlowFile

  • 인식하는 데이터 단위
  • 구성 요소
  • Processor와 Processor를 이동할 때마다 복사본이 만들어져서 추적이 가능하다. (내용은 복사하지 않고, 어디에 있는지 포인트 정보만 복사해 크게 부담되지는 않는다.) 
  • Expression Language가 지원되서 값을 다양하게 제어

 

2) Processor

  • FlowFile을 수집, 번형, 저장하는 기능
  • 150개가 넘는 Processor 제공하고 확장 가능하다.
  • ExecuteScript Processor 활용한다면 훨씬 많은 일들을 있다

 

3) Connection

  • Processor과 Processor을 연결해 FlowFile을 전달
  • FlowFile의 대기열(Queue) : FlowFile의 우선순위, 만료, 부하 조절 기능 제공

 

4) Repository (저장소)

  • FlowFile Repository - FlowFile은 생성되면 속석값과 내용 위치 저장
  • Content Repository - FlowFile 내용 저장
  • Provenance Repository - Processor가 처리될 때마다 FlowFile 이력(이벤트)이 남는다.

 

5) Flow Controller

  • NiFi 사용하는 스케줄러이다.
  • 특정 간격 또는 Cron 표현식( 으로 스케줄링을 있으며, 클러스터 환경에서 동시에 실행되는 것을 막기 위해 Primary Node에서만 실행할 수도 있다.
  • Controller Service 이용해 Processor 자원을 공유할 있다.
  • 예를들어 DBCPConnectionPool 이용해서 DB 연결 정보를 Processor간에 공유 가능하다.

 

728x90

댓글()