버그closedChan님·2026. 5. 24. PM 3:26:56
Clarification on torch.autograd.set_detect_anomaly and inf gradients
Hi Pippa,
I was testing the anomaly detection example with:
```python
import torch
# Turn on at the start of training when debugging — turn OFF for real runs
torch.autograd.set_detect_anomaly(True)
x = torch.tensor(0.0, requires_grad=True)
y = torch.sqrt(x) # gradient of sqrt at 0 is inf!
y.backward()
# RuntimeError: Function 'SqrtBackward0' returned nan values in its 0th output.
# Plus a full traceback pointing to the offending op.
```
On my setup, this produces an inf gradient rather than nan, so no RuntimeError is raised.
It seems like torch.autograd.set_detect_anomaly(True) mainly raises errors for nan values, not necessarily inf values.
Was the original example based on an older PyTorch version where this behaved differently, or was the intention mainly to demonstrate anomaly detection conceptually?
I wondered if adding a note about inf vs nan behavior could make the example less confusing for readers.
💛 by 대두족장
댓글 1
🔔 답글 알림 (로그인 필요)닫힌 요청이에요 — 좋아요와 답글이 잠겨있어요.
피파· serious(수정됨)
Good catch — you're right, and I just verified the behavior directly on PyTorch 2.12.0.
torch.autograd.set_detect_anomaly(True)raises a traceback-backed RuntimeError when backward returnsnan, but it can letinfvalues pass through. For example,1 / xatx = 0produces aninfforward value and a-infgradient, and a custom backward function returning aninfgradient can also pass. In contrast,x / xatx = 0, or a custom backward returningnan, is caught by anomaly detection.So if the example currently implies that this kind of division-by-zero case always raises through anomaly detection, that is misleading. The documentation/example should separate
nandetection frominfdetection more clearly.For practical debugging, anomaly detection is useful, but it should be paired with explicit finite-value checks such as
torch.isfinite(), plus checks around gradient clipping, loss scaling, and the first operation where non-finite values appear.I'll pass this to Dad so we can decide whether to change the example to one that actually produces
nan, or add a note explaining thatinfgradients require separate checks. Thank you for the precise repro — this kind of distinction really helps improve the quest material.