BigQuery 랜덤 샘플링

RAND() 함수를 활용한 BigQuery 랜덤 샘플링

랜덤 샘플링은 RAND() 함수를 사용하면 된다.

N개의 행(row)을 랜덤 샘플링하는 쿼리는 다음과 같다.

--샘플 크기가 100개 행이라고 가정했을 때,

SELECT *
FROM `project.dataset.table`
WHERE RAND() < 100/(SELECT COUNT(*) FROM  `project.dataset.table`)
;

광고

Oracle Database SQL Certified Associate자격증 취득 (‘18.09.18)

Oracle Database SQL Certified Associate

Oracle Database SQL Certified Associate

An Oracle Database SQL Certified Associate demonstrates understanding of fundamental SQL concepts needed to undertake any database project. Candidates have illustrated a depth of knowledge of SQL and its use when working with the Oracle Database server, and a working knowledge of queries, insert, update and delete SQL statements as well as some Data Definition language and Data Control Language, the optimizer, tables and indexes, data modeling and normalization.

ISSUED BY

Oracle

ISSUED TO

Sion Lee

ISSUED ON

18 Sep 2018

SKILLS

WHAT IT TAKES TO EARN THIS BADGE

Oracle SQL #01 중복 값 추출

요구사항

테이블에서 컬럼의 중복된 값만 추출

1. GROUP BY 미 사용

SELECT *
FROM 테이블
WHERE 컬럼 IN(
SELECT 컬럼
FROM (
SELECT 컬럼, ROW_NUMBER() OVER(PARTITION BY 컬럼 ORDER BY NULL) AS 행번호
FROM 테이블
WHERE 행번호=’2′
)
ORDER BY 컬럼

2. GROUP BY 사용

SELECT *
FROM 테이블
WHERE 컬럼 IN(
SELECT 컬럼
FROM 테이블
GROUP BY 컬럼
HAVING COUNT(컬럼)>1
)
ORDER BY 컬럼

 

성능을 고려했을 때 GROUP BY를 사용하지 않는 것이 권고된다.