性能分析#

有时计算似然函数的速度不如我们期望的那么快。Theano 提供了方便的性能分析工具,PyMC 通过 model.profile 对其进行了封装。此函数返回一个 ProfileStats 对象,其中包含有关底层 Theano 操作的信息。在这里,我们将对随机波动率示例的似然函数和梯度进行性能分析。

首先,我们构建模型。

import numpy as np
import pandas as pd
import pymc as pm

print(f"Running on PyMC v{pm.__version__}")
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Running on PyMC v5.10.0
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
# Load the data
returns = pd.read_csv(pm.get_data("SP500.csv"), index_col=0, parse_dates=True)
# Stochastic volatility example
with pm.Model() as model:
    sigma = pm.Exponential("sigma", 1.0 / 0.02, initval=0.1)
    nu = pm.Exponential("nu", 1.0 / 10)
    s = pm.GaussianRandomWalk("s", sigma**-2, shape=returns.shape[0])
    r = pm.StudentT("r", nu, lam=np.exp(-2 * s), observed=returns["change"])
C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\distributions\timeseries.py:293: UserWarning: Initial distribution not specified, defaulting to `Normal.dist(0, 100)`.You can specify an init_dist manually to suppress this warning.
  warnings.warn(

然后,我们调用 profile 函数并总结其返回值。

# Profiling of the logp call
model.profile(model.logp()).summary()
Function profiling
==================
  Message: C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:990
  Time in 1000 calls to Function.__call__: 1.331489e-01s
  Time in Function.vm.__call__: 0.10964459989918396s (82.347%)
  Time in thunks: 0.1032404899597168s (77.538%)
  Total compilation time: 9.425617e-01s
    Number of Apply nodes: 26
    PyTensor rewrite time: 8.539493e-01s
       PyTensor validate time: 2.782300e-03s
    PyTensor Linker time (includes C, CUDA code generation/compiling): 0.07904159999452531s
       Import time 3.465450e-02s
       Node make_thunk time 7.485420e-02s
           Node Composite{...}(s, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0, Composite{...}.0, Composite{...}.1, Composite{gammaln((0.5 * i0))}.0) time 4.614200e-03s
           Node Composite{...}(sigma_log__, 50.00000111758712, 0.0, 3.912023027779888, -inf) time 4.260600e-03s
           Node Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0) time 4.128100e-03s
           Node MakeVector{dtype='bool'}(All{axes=None}.0, Composite{...}.1) time 4.086700e-03s
           Node Composite{...}(nu_log__, 0, 0.1, 0.0, -2.302585092994046, -inf) time 3.909600e-03s

Time in all call to pytensor.grad() 0.000000e+00s
Time since pytensor import 4.949s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  86.3%    86.3%       0.089s       8.10e-06s     C    11000      11   pytensor.tensor.elemwise.Elemwise
   7.7%    94.0%       0.008s       3.98e-06s     C     2000       2   pytensor.tensor.math.Sum
   2.9%    96.9%       0.003s       7.51e-07s     C     4000       4   pytensor.tensor.elemwise.DimShuffle
   2.1%    99.0%       0.002s       1.06e-06s     C     2000       2   pytensor.tensor.basic.Join
   1.0%    99.9%       0.001s       3.28e-07s     C     3000       3   pytensor.tensor.subtensor.Subtensor
   0.1%   100.0%       0.000s       4.35e-08s     C     2000       2   pytensor.tensor.math.All
   0.0%   100.0%       0.000s       0.00e+00s     C     1000       1   pytensor.tensor.basic.Split
   0.0%   100.0%       0.000s       0.00e+00s     C     1000       1   pytensor.tensor.basic.MakeVector
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  77.4%    77.4%       0.080s       7.99e-05s     C     1000        1   Composite{...}
   7.7%    85.1%       0.008s       3.98e-06s     C     2000        2   Sum{axes=None}
   3.9%    89.0%       0.004s       4.00e-06s     C     1000        1   Composite{((-0.5 * sqr((i0 - i1))) - 0.9189385332046727)}
   3.1%    92.1%       0.003s       3.17e-06s     C     1000        1   Sub
   2.9%    95.0%       0.003s       7.51e-07s     C     4000        4   ExpandDims{axis=0}
   2.1%    97.0%       0.002s       1.06e-06s     C     2000        2   Join
   1.9%    99.0%       0.002s       1.98e-06s     C     1000        1   Switch
   1.0%    99.9%       0.001s       9.83e-07s     C     1000        1   Subtensor{i}
   0.1%   100.0%       0.000s       4.35e-08s     C     2000        2   All{axes=None}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{:stop}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{start:}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{gammaln((0.5 * i0))}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{reciprocal(sqr(i0))}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Split{2}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{((-0.5 * sqr((0.01 * i0))) - 5.524108719192764)}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   MakeVector{dtype='bool'}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Add
   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  77.4%    77.4%       0.080s       7.99e-05s   1000    13   Composite{...}(s, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0, Composite{...}.0, Composite{...}.1, Composite{gammaln((0.5 * i0))}.0)
   4.7%    82.2%       0.005s       4.89e-06s   1000    24   Sum{axes=None}(lam > 0, nu > 0)
   3.9%    86.0%       0.004s       4.00e-06s   1000    17   Composite{((-0.5 * sqr((i0 - i1))) - 0.9189385332046727)}(Split{2}.1, Composite{reciprocal(sqr(i0))}.0)
   3.1%    89.1%       0.003s       3.17e-06s   1000     6   Sub(Subtensor{start:}.0, Subtensor{:stop}.0)
   3.0%    92.1%       0.003s       3.08e-06s   1000    21   Sum{axes=None}(s_logprob)
   2.0%    94.0%       0.002s       2.02e-06s   1000    19   Join(0, sigma > 0, sigma > 0)
   1.9%    96.0%       0.002s       2.00e-06s   1000    22   ExpandDims{axis=0}(All{axes=None}.0)
   1.9%    97.9%       0.002s       1.98e-06s   1000    23   Switch(ExpandDims{axis=0}.0, Composite{...}.1, [-inf])
   1.0%    98.9%       0.001s       1.00e-06s   1000     8   ExpandDims{axis=0}(Composite{...}.0)
   1.0%    99.8%       0.001s       9.83e-07s   1000     3   Subtensor{i}(s, 0)
   0.1%    99.9%       0.000s       1.07e-07s   1000    11   Join(0, ExpandDims{axis=0}.0, Sub.0)
   0.1%   100.0%       0.000s       8.70e-08s   1000    15   All{axes=None}(Composite{...}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    25   Add(mu >= 0, sigma_log__, mu >= 0, nu_log__, s_logprob, Sum{axes=None}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    20   All{axes=None}(MakeVector{dtype='bool'}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    18   MakeVector{dtype='bool'}(All{axes=None}.0, Composite{...}.1)
   0.0%   100.0%       0.000s       0.00e+00s   1000    16   Composite{((-0.5 * sqr((0.01 * i0))) - 5.524108719192764)}(Split{2}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    14   Split{2}(Join.0, 0, [   1 2905])
   0.0%   100.0%       0.000s       0.00e+00s   1000    12   Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    10   Composite{...}(ExpandDims{axis=0}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000     9   Composite{gammaln((0.5 * i0))}(ExpandDims{axis=0}.0)
   ... (remaining 6 Apply instances account for 0.00%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the PyTensor flag floatX=float32
  - Try installing amdlibm and set the PyTensor flag lib__amblibm=True. This speeds up only some Elemwise operation.
# Profiling of the gradient call dlogp/dx
model.profile(pm.gradient(model.logp(), vars=None)).summary()
Function profiling
==================
  Message: C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:990
  Time in 1000 calls to Function.__call__: 1.622821e-01s
  Time in Function.vm.__call__: 0.13946900120936334s (85.942%)
  Time in thunks: 0.13289427757263184s (81.891%)
  Total compilation time: 9.082990e-01s
    Number of Apply nodes: 34
    PyTensor rewrite time: 8.388561e-01s
       PyTensor validate time: 8.517501e-03s
    PyTensor Linker time (includes C, CUDA code generation/compiling): 0.059757000009994954s
       Import time 2.345270e-02s
       Node make_thunk time 5.872960e-02s
           Node Composite{...}(s, ExpandDims{axis=0}.0, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0) time 4.175900e-03s
           Node Composite{...}(nu_log__, 1.0, 0.5, -1453.0, 1453.0, -0.1, 0.0, 0) time 4.137400e-03s
           Node Composite{(i4 + i5 + (i0 * i1 * i2 * i3))}(-2.0, Sum{axes=None}.0, Composite{...}.1, Composite{...}.0, Composite{...}.2, 1.0) time 3.595000e-03s
           Node Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0) time 3.540400e-03s
           Node Composite{...}(sigma_log__, -50.00000111758712, 0.0, 0) time 3.315400e-03s

Time in all call to pytensor.grad() 2.053549e-01s
Time since pytensor import 6.475s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  72.1%    72.1%       0.096s       9.58e-06s     C    10000      10   pytensor.tensor.elemwise.Elemwise
   6.8%    78.9%       0.009s       1.81e-06s     C     5000       5   pytensor.tensor.elemwise.DimShuffle
   6.8%    85.7%       0.009s       3.00e-06s     C     3000       3   pytensor.tensor.math.Sum
   6.0%    91.7%       0.008s       2.67e-06s     C     3000       3   pytensor.tensor.basic.Join
   3.8%    95.5%       0.005s       1.67e-06s     C     3000       3   pytensor.tensor.subtensor.IncSubtensor
   2.3%    97.7%       0.003s       1.50e-06s     C     2000       2   pytensor.tensor.basic.Split
   1.5%    99.2%       0.002s       1.01e-06s     C     2000       2   pytensor.tensor.shape.Reshape
   0.8%   100.0%       0.001s       3.34e-07s     C     3000       3   pytensor.tensor.subtensor.Subtensor
   0.0%   100.0%       0.000s       0.00e+00s     C     3000       3   pytensor.tensor.shape.SpecifyShape
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  65.3%    65.3%       0.087s       8.68e-05s     C     1000        1   Composite{...}
   6.8%    72.1%       0.009s       3.00e-06s     C     3000        3   Sum{axes=None}
   6.0%    78.1%       0.008s       2.67e-06s     C     3000        3   Join
   3.8%    81.9%       0.005s       1.26e-06s     C     4000        4   ExpandDims{axis=0}
   3.0%    84.9%       0.004s       4.00e-06s     C     1000        1   DropDims{axis=0}
   3.0%    87.9%       0.004s       4.00e-06s     C     1000        1   Composite{...}
   2.3%    90.2%       0.003s       1.50e-06s     C     2000        2   Split{2}
   1.5%    91.7%       0.002s       1.01e-06s     C     2000        2   Reshape{1}
   1.5%    93.2%       0.002s       2.00e-06s     C     1000        1   IncSubtensor{i}
   1.5%    94.7%       0.002s       2.00e-06s     C     1000        1   Sub
   1.5%    96.2%       0.002s       2.00e-06s     C     1000        1   IncSubtensor{:stop}
   1.5%    97.7%       0.002s       2.00e-06s     C     1000        1   Composite{...}
   0.8%    98.5%       0.001s       1.00e-06s     C     1000        1   Subtensor{:stop}
   0.8%    99.2%       0.001s       1.00e-06s     C     1000        1   IncSubtensor{start:}
   0.8%   100.0%       0.001s       1.00e-06s     C     1000        1   Composite{(i4 + i5 + i6 + i7 + (i0 * i1 * i2 * i3) + i8 + i9)}
   0.0%   100.0%       0.000s       0.00e+00s     C     3000        3   SpecifyShape
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{start:}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{i}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{reciprocal(sqr(i0))}
   ... (remaining 3 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  65.3%    65.3%       0.087s       8.68e-05s   1000    12   Composite{...}(s, ExpandDims{axis=0}.0, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0)
   3.8%    69.1%       0.005s       5.00e-06s   1000    33   Join(0, SpecifyShape.0, SpecifyShape.0, (d__logp/ds))
   3.0%    72.1%       0.004s       4.00e-06s   1000    28   DropDims{axis=0}(SpecifyShape.0)
   3.0%    75.1%       0.004s       4.00e-06s   1000    15   Sum{axes=None}(Composite{...}.2)
   3.0%    78.1%       0.004s       4.00e-06s   1000    16   Composite{...}(Split{2}.1, Composite{reciprocal(sqr(i0))}.0)
   2.3%    80.4%       0.003s       3.03e-06s   1000     9   ExpandDims{axis=0}(Composite{...}.1)
   2.3%    82.7%       0.003s       3.00e-06s   1000    13   Split{2}(Join.0, 0, [   1 2905])
   2.3%    84.9%       0.003s       3.00e-06s   1000    14   Sum{axes=None}(Composite{...}.1)
   1.5%    86.5%       0.002s       2.03e-06s   1000    25   Reshape{1}((d__logp/dsigma_log__), [-1])
   1.5%    88.0%       0.002s       2.00e-06s   1000    29   IncSubtensor{i}(Composite{...}.0, DropDims{axis=0}.0, 0)
   1.5%    89.5%       0.002s       2.00e-06s   1000    32   IncSubtensor{:stop}(IncSubtensor{start:}.0, Neg.0, -1)
   1.5%    91.0%       0.002s       2.00e-06s   1000     6   Sub(Subtensor{start:}.0, Subtensor{:stop}.0)
   1.5%    92.5%       0.002s       2.00e-06s   1000    11   Join(0, ExpandDims{axis=0}.0, Sub.0)
   1.5%    94.0%       0.002s       2.00e-06s   1000    19   Sum{axes=None}(Composite{...}.0)
   1.5%    95.5%       0.002s       2.00e-06s   1000     4   Composite{...}(nu_log__, 1.0, 0.5, -1453.0, 1453.0, -0.1, 0.0, 0)
   0.8%    96.2%       0.001s       1.00e-06s   1000     1   Subtensor{:stop}(s, -1)
   0.8%    97.0%       0.001s       1.00e-06s   1000    30   IncSubtensor{start:}(IncSubtensor{i}.0, Split{2}.1, 1)
   0.8%    97.7%       0.001s       1.00e-06s   1000    21   Join(0, Mul.0, Composite{...}.1)
   0.8%    98.5%       0.001s       1.00e-06s   1000    17   Composite{(i4 + i5 + i6 + i7 + (i0 * i1 * i2 * i3) + i8 + i9)}(-1.0, 0.5, Sum{axes=None}.0, Composite{...}.1, Composite{...}.4, Composite{...}.3, -1452.9999999999736, Composite{...}.2, Sum{axes=None}.0, 1.0)
   0.8%    99.2%       0.001s       1.00e-06s   1000     7   ExpandDims{axis=0}(Subtensor{i}.0)
   ... (remaining 14 Apply instances account for 0.75%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the PyTensor flag floatX=float32
  - Try installing amdlibm and set the PyTensor flag lib__amblibm=True. This speeds up only some Elemwise operation.
%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Mon Dec 04 2023

Python implementation: CPython
Python version       : 3.11.6
IPython version      : 8.15.0

pandas: 2.1.3
pymc  : 5.10.0
numpy : 1.26.2

Watermark: 2.4.3