• nimble@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 days ago

    Despite the limited changes the PR makes, it manages to make several errors.

    According to benchmarks in issue #31130:

    • With broadcast: np.column_stack → 36.47 µs, np.vstack().T → 27.67 µs (24% faster)
    • Without broadcast: np.column_stack → 20.63 µs, np.vstack().T → 13.18 µs (36% faster)

    Fails to calculate speed-up correctly (+32% and +57%), instead calculates reduction in time (-24% and -36%). Also those figures are just regurgitated from the original issue.

    The improvement comes from np.vstack().T doing contiguous memory copies and returning a view, whereas np.column_stack has to interleave elements in memory.

    Regurgitated information from the original issue.

    Changes

    • Modified 3 files
    • Replaced 3 occurrences of np.column_stack with np.vstack().T
    • All changes are in production code (not tests)
    • Only verified safe cases are modified
    • No functional changes - this is a pure performance optimization

    The PR changes 4 files.