블로그 이미지
평범하게 살고 싶은 월급쟁이 기술적인 토론 환영합니다.같이 이야기 하고 싶으시면 부담 말고 연락주세요:이메일-bwcho75골뱅이지메일 닷컴. 조대협


Archive»


 

'IAM'에 해당되는 글 2

  1. 2016.06.18 빅쿼리-#3 데이타 구조와 접근(공유) (3)
  2. 2013.05.20 Identity Management System (IDM) Overview
 

빅쿼리-#3 데이타 구조와 데이타 공유 권한관리


조대협 (http://bcho.tistory.com)


빅쿼리에 대한 개념 및 내부 구조에 대한 이해가 끝났으면, 빅쿼리의 데이타 구조와, 데이타에 대한 권한 관리에 대해서 알아보도록 한다.

데이타 구조

빅쿼리의 데이타 구조는 다음과 같은 논리 구조를 갖는다. 일반적인 RDBMS와 크게 다르지 않다.





데이타 구조

프로젝트 (Project)

먼저 프로젝트라는 개념을 가지고 있다. 하나의 프로젝트에는 여러개의 데이타셋이 들어갈 수 있다.

데이타셋 (Dataset)

데이타셋은 MySQL의 DB와 같은 개념으로, 여러개의 테이블을 가지고 있는 테이블의 집합이다. 이 단위로 다른 사용자와 데이타를 공유할 수 있다.

테이블 (Table)

데이타를 저장하고 있는 테이블이다.

잡 (Job)

쿼리나, 데이타 로딩, 삭제와 같이 데이타에 대해서 어떤 명령을 내렸을때, 그 명령을 잡(Job)이라고 하며, 각 Job들은 누가 언제 어떤 내용을 수행하였는지, 향후 감사를 목적으로 모두 로깅 된다.

데이타 타입

테이블에 저장될 수 있는 데이타 구조는 다음과 같다.

  • STRING : UTF-8인코딩. 최대 2MB

  • INTEGER : 64 bit

  • FLOAT :  Double precision

  • BOOLEAN

  • RECORD : Collection of one or more field

  • TIMESTAMP


특이한 것이 RECORD 라는 데이타 타입인데, 레코드는 JSON과 같이 여러개의 데이타를 가지는 데이타형을 이야기한다.


아래 그림은 웹UI에서 ID와 NAME이라는 두개의 컬럼을 가지고 있는 테이블을 생성하는 화면이다.

Name은 앞서 설명한 RECORD  타입으로 정의했고, Name 필드 안에는 Last_name과, First_name이라는 STRING형 필드를 갖는다.



이렇게 생성된 테이블의 구조는 다음과 같이 된다.


RECORD 형 데이타 타입 안에는 앞서 정의된 STRING,INTEGER 등의 데이타 타입으로 컬럼 정의가 가능하며, RECORD 형 데이타 타입이 또 그 안에 들어갈 수 있다.  (JSON 데이타형과 매우 유사하다고 보면 된다).


REPEATED FIELD

테이블내의 각 컬럼의 값들은 NULL (값이 없거나), 일반적인 테이블 처럼 1개의 값을 가질 수 도 있지만, 컬럼을 정의할때, REPEATABLE 이라고 정의하면 하나의 필드에도 여러개의 값을 가질 수 있다. (JSON의 배열 처럼)




위는 웹 UI에서, ID와 Basket이라는 두개의 STRING 필드를 가지고 있는 테이블을 정의하는 화면이고 그중, Basket을 REPEATED 필드로 정의하였다.

이렇게 정의된 테이블의 모양은 다음과 같다.



Terry.cho 데이타 처럼 Basket 하나의 컬럼에 여러개의 데이타를 가지고 있는 것을 볼 수 있다.

권한 관리 및 공유

빅쿼리는 여러 사람이 각자의 계정을 가지고 사용을 할 수 있으며, 각 사용자별로 특정 데이타셋에 대한 수정,조회,삭제 권한을 부여할 수 있다.

권한 적용 대상

권한을 적용하여 접근을 통제할 수 있는 대상은 프로젝트와, 데이타셋이다. (테이블 단위로는 권한 적용이 불가)


권한 부여 대상

  • 사용자
    개인 사용자에게 데이타에 대한 권한을 지정할 수 있다.

  • 구글 그룹스 기반의 사용자 집합
    다수의 사용자가 속해 있는 구글 그룹스 (https://groups.google.com/) 사용자들에게 권한을 지정할 수 있다.

  • 특정 역할(ROLE)을 가지고 있는 사용자
    사용자중에서 특정한 역할 (관리자, 부서등)을 가지고 있는 사용자들에게 권한을 지정할 수 있다.


권한 종류

데이타셋 권한

  • READER : 데이타셋에 대한 데이타 조회 가능

  • WRITER : 데이타셋 내의 테이블에 대한 생성, 데이타 조회 및 추가 가능

  • OWNER:  데이타셋 업데아트 및 삭제 가능

프로젝트 권한

  • Viewer : JOB 수행과 수행중인 JOB 모니터링 가능. 데이타셋에 대한  READER 권한

  • Editor : 데이타셋 생성 가능 + 데이타셋에 대한 WRITER 권한

  • Owner :  데이타셋에 대한 모든 JOB 수행 가능. 데이타셋 삭제 가능


데이타셋 권한

데이타 공유 하기

데이타셋 공유

빅쿼리 웹UI에서 다른 사람과 공유하고자 하는 데이타셋을 선택한 후에, “Share dataset” 이라는 메뉴를 선택하면 아래와 같이 데이타셋 공유 메뉴가 출력 된다.



이 메뉴에서 공유 적용 대상 (특정인, 이메일 그룹이나 특정 역할)을 선택하고 공유 권한 (View, Edit, Owner) 권한을 부여하면 공유를 부여 받은 대상이 이 데이타 셋에 접근할 수 있다.


데이타셋의 소유자가 데이타셋을 공유했다고, 공유 받는 쪽에서 자동으로 데이타가 보이는 것은 아니며, 데이타를 공유 받는 쪽에서, 프로젝트에서 Switch Project 메뉴를 선택한 후 Display Project메뉴를 선택하면


Add project 창이 나온다. 이때 앞에서 공유된 프로젝트 명을 입력한다. (공유된 프로젝트명을 알고 있어야 한다.)

그러면 공유된 데이타 셋이 속한 프로젝트와 공유되니 데이타셋이 좌측에 표시되고 접근이 가능하게 된다.




그림. github archive 프로젝트와 그아래 github,day 데이타셋이 공유되어 추가된 화면



프로젝트 공유

데이타셋 단위뿐 아니라 조금 더 큰 범주에서 프로젝트 자체를 공유할 수 있다.

프로젝트에 대한 권한 관리는 구글 클라우드 메뉴에서 IAM & Admin 메뉴로 들어가서


 IAM 메뉴에서 Add members를 선택하면, 아래 그림과 같이 사용자 이름을 입력할 수 있는 텍스트 박스가 뜨고, 우측에 Select Role 리스트 박스가 나온다. 여기서 Projects 메뉴를 선택하면 하부 메뉴로, Owner, Editor, Viewer 등의 권한을 부여할 수 있다.




빅쿼리의 데이타 구조, 데이타 타입 그리고 권한 접근 구조에 대해서 알아보았다.

다음에는 실제로 프로젝트와 데이타셋을 생성하고, 테이블을 생성한 후 데이타를 로딩해보도록 하겠다.


본인은 구글 클라우드의 직원이며, 이 블로그에 있는 모든 글은 회사와 관계 없는 개인의 의견임을 알립니다.

댓글을 달아 주세요

  1. 최석민 2016.06.23 19:30  댓글주소  수정/삭제  댓글쓰기

    선생님 MSA가 궁금해서 찾아보다 들리게 된 이제 막 취업에 성공하여 시작하게 된 신입개발자 입니다. 제가 병아리라는 건 알고 있었지만 선생님의 방대한 지식에 놀랐습니다. 매일매일 들어와서 한 두편씩 꼭 읽어보겠습니다! 좋은 글과 지식 공유 감사드립니다.

  2. 1466870852 2016.06.26 01:07  댓글주소  수정/삭제  댓글쓰기

    좋은글 감사

  3. 1467604948 2016.07.04 13:02  댓글주소  수정/삭제  댓글쓰기

    좋은 정보 잘보고 갑니다

Identity Management System (IDM) Overview

Terry.Cho (http://bcho.tistory.com)


1. Background

 

IDM (Identity Management system) is one of most important and complex component in common IT system. 

Pain Point

Here is sample pain point in Identity management scenario when it comes from identity management area commonly.

 

Federation.

1)        Enterprise build their IT system with very simple & isolated identity management feature. All of each system has own IT management features.

2)        Number of the IT system has been grown, and it has own identity management system.

3)        End user starts complain to log in with different id for each system.

Lifecycle management & Provisioning

1)        After employee leaves company, Enterprise it admin needs to delete all of identity across the system.

2)        After new employee has been joined, his identity need to be created in email, ERP, CRM etc. Some identity creation needs to be approved by manager

 

Without common Identity management platform, identity management is being very painful. 

B2B vs B2C

There are two main category that uses identity management. One is B2B and the other one is B2C.

B2B is enterprise IT. It is designed for manage internal user or restricted number of end client.

The characteristic of B2B system is,

-       It has very complex scenario to support their business

-       It has a lot of package based legacy system like ERP,CRM etc.

-       It needs very elaborate authorization control.

-       It has many types of roles (admin, manager, end user, org admin etc).

So the product which supports B2B scenario,  focuses legacy system integration (provisioning, connector, standard support - WS-Security, SAML, XACML etc ) , work flow support etc.

This area is mainly driven by enterprise vendor like Oracle,CA,IBM etc.

 

In contrast, B2C area has different requirement.

In B2C area , it provides service to customer like SNS.

-       It supports huge # of end user (+million)

-       Role type and authorization control is very simple compare to B2B scenario.

-       Open standard based federation model (OAuth 2.0, Open ID etc)

-       Global deployment 

Trend & Implementation options

To build up IDM system there are 3 different approach.

1)       Option A. build with open source framework

build IDM system from scratch or reuse open source frame work.

To support just single silo system, big identity management system is not required. In this case, user just build the IDM system from scratch. For small to medium # of user, RDBMS backend preferred.  For medium to big number of user, LDAP or Microsoft Active Directory is preferred.

 

To provided more platformanized (or well defined) IDM, open source frame work can be used.

Spring Security is one of major player in this area. It is more focused on web based application.

Apache Shiro is also one of the other major player. It can support web and others (REST API based security control etc).   

2)       Option B. build with niche vendor solution  

If IT has to support more complex scenario. It that case it can consider solution. There are a lot of solutions which are optimized to specific scenario.

For example, Centrify is well optimized to support Active Directory based single sign on in B2B scenario. PingIdentity is good for user account federation scenario.

3)       Option C. Full package from enterprise vendor

If the company has a lot of package based legacy system and it needs sophisticated role based authorization control, long running work flow for authorization approval, audit etc, full packaged IDM is recommended.

These kind of IDM product is delivered by enterprise vendor like Oracle,IBM,CA etc.



Figure 1. IDM gartner magic quadrent 2010

Trend is,

For B2C, commonly it uses Option A and if there is more complex requirement it uses (or moves from Option A to ) Option B. Big B2C company like Facebook, Google builds up their own IDM system with Option A approach.

 

For B2B, for small & isolated system it uses Option A. For restricted scenario, Option B. For enterprise wide it uses Option C. Commonly enterprise IT system has its own LDAP server internally and they provides minimum single sign on with solutions (Option B).

 Commonly they has SSO, Provisioning only, not support authorization and other stuff.  The authorization supports requires a lot of customization both in IDM and service application side. And full package vendor solution is very expensive, complex and hard to manage. 

2. IDM System common features

Here is common feature which is provided by traditional IDM systems.

1)       User Management

It managed user identity during full life cycle. It created, update and delete the user identity information.

Ÿ   Lifecycle management

This feature manages whole life cycle of user identity management from creation to remove. Depends on requirement, user identity can be expired based on pre-defined logic. It also can manage password expiration date etc.

Ÿ   Work flow

Some user identity creation or new authorization permission guarantee needs a approval. For example in case of banking account creation, it needs to check user identity. This kinds of approval required long running process.

It is implemented by using work flow engine (eg. BPM etc)

Ÿ   Provisioning

When user identity has been created or modified, it need to be replicated another system. For example new email has been created in email system, new account in sales system need to be created. In that case the user profile should be replicated (provisioned). It is one of very important feature in centralized IDM.

Ÿ   Delegated Admin

To manage user identity , single IDM admin is not enough. If the company has a lot of organization and authorization control is requires, single IDM admin cannot cover whole of the requests. So in that case restricted admin authority need to be delegated to someone (ex. managers in the organization ). This feature is delegated admin.

One more thing for this feature, if the delegated admin has been leaved the company the delegated authority should be propagated to another user in the IDM.

Ÿ   Identification management

In specific system, user identity which prove "Who is the user?" is very important.

In Banking, Stock Trading system, user identity proven is very important issue. To support them IDM manages additional information like user certification, finger print and user biometric data 

2)       Access Management

Access management defines "Can user access specific resource?". It allows system to provide restricted access

Ÿ   Authentication

Authentication is the process of determining whether someone or something is, in fact, who or what it is declared to be. This is commonly done by comparing user identity & credentials (id & password)

Ÿ   Authorization (ACL , Entitlement)

This is process of granting or denying access to resource.

In other term, it is controlled by "ACL (Access Control List)". It describes "Who can access what resource".

In authorization scenario, there are 3 types of access control

       RBAC (Resource Based Access Control)

Resource access is controlled by user role. Individual user can have number of role. For example a user can be "Partner" ,"Admin" ,"End User". Resource control is granted by pre defined access control based on each role.

RBAC is one of most broadly used authorization method.

       DAC (Discretionary Access Control)

It is more flexible compare to RBAC. DAC manages authority based on user identity (user id or it's associated group)

       MAC (Mandatory Access Control)
User are given permission to resources by system administrator. Only the admin can grant permission to resource.

Ÿ   Federation (SSO)

If there are number of systems and user logged once in one system, it doesn't need to log into other system anymore. This is Single Sign On.

There are standards to support SSO like SAML, CAS, Kerberos etc.  

3)       Repository

Repository persists user identity & profile.

User identity has user id and password for log in. ACL (Access Control List) and user profile which contains user related data for example - name, address , email etc.

This repository is read intensive. And it needs to support tree like structure because, user identity combines user organization structure too. In this reason LDAP is common solution for repository.

If system have to support global roll out, it should also consider regulation issue. Some user information cannot be stored outside their country. When it designs user profile scheme, legal check is required.

And to support the global roll out, data replication across data center should be supported.  

4)       Audit & Reporting

Audit means, "who did what to which resource?". It can enables admin to track resource usage, denial resource access etc. In some system , the access log can be used to track user pattern. Web access log analysis scenario is one of the example. In addition the resource access log can be used to metering service usage. (cloud computing scenario etc)

For denial access, it need to support notification message to admin and reporting. To prevent denied access, it also need to support "black list".

This area is consists of logging, gathering, analysis, reporting and achieving. Now days, it is implemented by using big data technology. (logging framework etc)  

5)       Integrations

Integration feature is integrate multiple identity management system.

There are many perspectives. Replicate user profile from one to other systems is covered by "provisioning". Authentication across number of system can be covered by Single Sign On. Authorization over number of system can be covered by XACML based authorization system.

To simplify integration, we can have 3 perspective like below

Ÿ   Open standard support

Integration support is old problem in Identity management area. So there are already open standard to support the integration issue.

In B2C area, Open ID and OAuth are major player that support authentication.

In B2B area, there are a lot of standard like

-        SAML,WS-Security: support SSO & Federation

-        XACML: support authorization

-        LDAP or Microsoft Active Directory : repository integration

-        WS-Trust : API Security

Ÿ   Internal service integration

In enterprise, there are a lot of internal system. Especially legacy enterprise system (ERP,CRM) has very complex user profile scheme, organization structure and sometimes it doesn't support open standard.  So it needs special integration connector to support the integration (provisioning, authorization etc).

The Connector support is main feature of internal service integration

Ÿ   External service integration

It covers identity integration covers external system which resides in outside of company.  

-        B2C integration - There are already well know B2C service system like google, Windows Live accout, Face book, Twitter account. B2C integration scenario is usually implemented with open standard (OAuth, OpenID, Active Directory etc)

-        B2B integration - This area can have various scenario depends on requirement. If company A provides service B's from company B with white-label. They need to support SSO. In this case company A,B need to integrate their authentication by using SSO. In this scenario, if company B charge the service, user identity need to be provisioned from company A to company B to measure their usage.

B2B integration is occurred by ad-hoc way. There are no common approach in this area. Best way is clarify gap between two different identity management system and make integration scenario case by case. This approach is similar to EAI (Enterprise Application Integration).

-        B2B (Cloud) integration - There are already cloud enterprise cloud service like SalesForce.com, Microsoft Office 365. This service needs to integrate with company wide IDM system. 

3. IDM deployment model

To understand IDM deployment model, we have to understand IDM term first

Ÿ   IdP (Identity Provider) : This is IDM. It persists user identity, authenticate & authorize incoming request.

Ÿ   SP (Service Provider) : It provides service to end user. It has resource. Access to the resources are restricted by Idp. Example. Web Site etc.

Ÿ   Token : User credentials (id & password, or log in token - which is used for authentication)

There are 3 types of deployment models

Isolated IDM Model

Each service provider has it's own IdP. End user has to log in for each service provider with different identity.



Centralized IDM Model

Each services shares single IdP. This is most ideal model. End user can log in and access with single user identity.

All of access controls in all Service Provides are controlled by single ACL. It is consistent.

But it is hard to meet in real world. Product (open source or solution) already has it own Idp internally. If all of Service Providers are built from scratch, it can support this model.



Federated IDM Model

End user perspective, it is same to centralized IDM model. End user logs in Service Provider with single user identity. But each Service Provider has different IdP in backend.

This is common use case in IDM area. Authentication is integrated by SSO (Federation) and Authorization is covered by Entitlement (XACML etc)




Here is reference architecture of federation model



User management system create & update user profile. The profile is propagated to each IdP servers thru provisioning components. Service Provider has recent version of user profile.

End user logs in Service provider. It is federated by using SSO.


 

본인은 구글 클라우드의 직원이며, 이 블로그에 있는 모든 글은 회사와 관계 없는 개인의 의견임을 알립니다.

댓글을 달아 주세요