Skip to content

[Question]: 多卡时运行run_pretrain.py存在报错,单卡时可以正常运行 #11205

@Buddingpopp

Description

@Buddingpopp

请提出你的问题

单卡时训练可以正常完成,在相同配置下,单机双GPU运行 python -u -m paddle.distributed.launch --gpus "0,1" run_pretrain.py ./config/qwen/pretrain_argument_0p5b.json
会有如下报错:

OSError: [Errno 101] Network is unreachable

urllib3.exceptions.NewConnectionError: HTTPSConnection(host='bj.bcebos.com', port=443): Failed to establish a new connection: [Errno 101] Network is unreachable

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/community/Qwen/Qwen2.5-0.5B/added_tokens.json (Caused by NewConnectionError("HTTPSConnection(host='bj.bcebos.com', port=443): Failed to establish a new connection: [Errno 101] Network is unreachable"))

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/community/Qwen/Qwen2.5-0.5B/added_tokens.json (Caused by NewConnectionError("HTTPSConnection(host='bj.bcebos.com', port=443): Failed to establish a new connection: [Errno 101] Network is unreachable"))

OSError: Can't load the model for 'Qwen/Qwen2.5-0.5B'. If you were trying to load it from 'BOS', make sure you don't have a local directory with the same name. Otherwise, make sure 'Qwen/Qwen2.5-0.5B' is the correct path to a directory containing one of the ['added_tokens.json']

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/community/Qwen/Qwen2.5-0.5B/added_tokens.json (Caused by NewConnectionError("HTTPSConnection(host='bj.bcebos.com', port=443): Failed to establish a new connection: [Errno 101] Network is unreachable"))

OSError: Can't load the model for 'Qwen/Qwen2.5-0.5B'. If you were trying to load it from 'BOS', make sure you don't have a local directory with the same name. Otherwise, make sure 'Qwen/Qwen2.5-0.5B' is the correct path to a directory containing one of the ['added_tokens.json']

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions