Google DataFlow 无法在不同位置读写(Python SDK v0.5.5

时间：2022-11-18

本文介绍了Google DataFlow 无法在不同位置读写(Python SDK v0.5.5)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Python SDK v0.5.5 编写一个非常基本的 DataFlow 管道.该管道使用带有传入查询的 BigQuerySource，该查询正在从位于欧盟的数据集中查询 BigQuery 表.

I'm writing a very basic DataFlow pipeline using the Python SDK v0.5.5. The pipeline uses a BigQuerySource with a query passed in, which is querying BigQuery tables from datasets that reside in EU.

执行管道时出现以下错误(项目名称匿名):

When executing the pipeline I'm getting the following error (project name anonymized):

HttpError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/XXXXX/queries/93bbbecbc470470cb1bbb9c22bd83e9d?alt=json&maxResults=10000>: response: <{'status': '400', 'content-length': '292', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'expires': 'Thu, 09 Feb 2017 10:28:04 GMT', 'vary': 'Origin, X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 'private, max-age=0', 'date': 'Thu, 09 Feb 2017 10:28:04 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; ma=2592000; v="35,34"', 'content-type': 'application/json; charset=UTF-8'}>, content <{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "invalid",
    "message": "Cannot read and write in different locations: source: EU, destination: US"
   }
  ],
  "code": 400,
  "message": "Cannot read and write in different locations: source: EU, destination: US"
 }
}

在指定项目、数据集和表名时也会出现该错误.但是，从可用的公共数据集(位于美国——如莎士比亚)中选择数据时没有错误.我也有运行 SDK 的 v0.4.4 的作业，但没有此错误.

The error also occurs when specifying a project, dataset and table name. However there's no error when selecting data from the public datasets available (which reside in US - like shakespeare). I also have jobs running v0.4.4 of the SDK which don't have this error.

这些版本之间的区别在于临时数据集的创建，如管道启动时的警告所示:

The difference between these versions is the creation of a temp dataset, as is shown by the warning at pipeline startup:

WARNING:root:Dataset does not exist so we will create it

我简要了解了 SDK 的不同版本，差异似乎在于这个临时数据集.看起来当前版本默认创建了一个临时数据集，其位置在美国(取自 master):

I've briefly taken a look at the different versions of the SDK and the difference seems to be around this temp dataset. It looks like the current version creates a temp dataset by default with a location in US (taken from master):

创建数据集
默认数据集位置

我还没有找到禁用创建这些临时数据集的方法.我是否忽略了某些东西，或者在从欧盟数据集中选择数据时这确实不再起作用?

I haven't found a way to disable the creation of these temp datasets. Am I overlooking something, or is this indeed not working anymore when selecting data from EU datasets?

Google DataFlow 无法在不同位置读写(Python SDK v0.5.5

问题描述

推荐答案

相关文章

最新文章