Skip to content

Feature store lakeformation#5599

Open
BassemHalim wants to merge 34 commits intoaws:masterfrom
BassemHalim:feature-store-lakeformation
Open

Feature store lakeformation#5599
BassemHalim wants to merge 34 commits intoaws:masterfrom
BassemHalim:feature-store-lakeformation

Conversation

@BassemHalim
Copy link
Copy Markdown
Contributor

@BassemHalim BassemHalim commented Mar 4, 2026

Description

This PR adds Lake Formation integration to SageMaker Feature Store, enabling customers to govern access to their offline store data through AWS Lake Formation instead of relying solely on IAM policies.

This simplifies the manual process described in this blog
https://aws.amazon.com/blogs/machine-learning/control-access-to-amazon-sagemaker-feature-store-offline-using-aws-lake-formation/

New Features

LakeFormationConfig — declarative configuration for Lake Formation governance:

class LakeFormationConfig(Base):                                                                                                       
    enabled: bool = False                                                                                                              
    use_service_linked_role: bool = True                                                                                               
    registration_role_arn: Optional[str] = None                                                                                        
    disable_hybrid_access_mode: bool          # Required — no default                                                                  
    acknowledge_risk: bool                    # Required — no default                                      

FeatureGroupManager.create() — added lake_formation_config parameter

  • Enables Lake Formation governance at Feature Group creation time
  • Automatically waits for Feature Group to reach "Created" status before configuring Lake Formation

FeatureGroupManager.enable_lake_formation() — new method

  • Enables Lake Formation on existing Feature Groups

  • Three-phase setup:

    1. Registers S3 location with Lake Formation
    2. Grants permissions to the execution role on the Glue table
    3. Optionally revokes IAMAllowedPrincipal permissions from the Glue table (controlled by disable_hybrid_access_mode)
  • Fail-fast behavior with clear error reporting at each phase

  • Interactive confirmation prompts warn users about risks (controllable via acknowledge_risk)

  • Logs recommended S3 deny policy as a warning

Usage

Enable at creation:

  from sagemaker.mlops.feature_store import FeatureGroupManager, LakeFormationConfig                                                     
                                                                                                                                         
  lf_config = LakeFormationConfig(                                                                                                       
      enabled=True,                                                                                                                      
      disable_hybrid_access_mode=True,                                                                                                   
      acknowledge_risk=True,  # skip interactive prompt                                                                                  
  )                                                                                                                                      
                                                                                                                                         
  fg = FeatureGroupManager.create(                                                                                                       
      feature_group_name="my-feature-group",                                                                                             
      # ... other params ...                                                                                                             
      lake_formation_config=lf_config,                                                                                                   
  )                                                                                                                                      
                                                                                                                                         
  #Enable on existing Feature Group:                                                                                                      
                                                                                                                                         
  fg = FeatureGroupManager.get("my-feature-group")                                                                                       
  result = fg.enable_lake_formation(                                                                                                     
      disable_hybrid_access_mode=True,                                                                                                   
      acknowledge_risk=True,                                                                                                             
  )                                                                                                                                      

Testing

  • Unit tests: comprehensive coverage of all new methods, validation logic, and acknowledge_risk behavior
  • Integration tests: end-to-end tests for both creation workflows and negative scenarios

Notes

  • S3 deny policy is logged as a recommendation (not applied automatically) to avoid breaking existing workflows
  • disable_hybrid_access_mode is a required field with no default — users must explicitly choose whether to revoke IAMAllowedPrincipal permissions
  • acknowledge_risk controls interactive confirmation: None (default) prompts via input(), True skips the prompt, False aborts with RuntimeError

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Copy Markdown
Contributor

@nargokul nargokul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix integ tests

@BassemHalim BassemHalim force-pushed the feature-store-lakeformation branch from 6f00f8a to 3baff6c Compare April 9, 2026 00:27
@BassemHalim BassemHalim force-pushed the feature-store-lakeformation branch from 3baff6c to e706a5c Compare April 9, 2026 01:12
BassemHalim and others added 2 commits April 14, 2026 13:31
- Remove unused datetime imports
- Remove debug print statement from resource registration
- Update docstring to clarify S3 deny bucket policy is recommended
- Refactor error handling to use fail-fast with deferred warnings pattern
- Store phase errors instead of immediately raising to allow all phases to attempt execution
- Move warning logs before error re-raise so incomplete steps are reported before exception
- Simplify phase execution logic by checking phase_error status before attempting each phase
- Improve error messages to guide users on re-running the method after fixing issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: feature store Relates to the SageMaker Feature Store Platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants