.. = RISC-V "V" ベクトル拡張

Version 1.0-rc2-draft
:doctype: article
:encoding: utf-8
:lang: en
:toc: left
:numbered:
:stem: latexmath
:le: &#8804;
:ge: &#8805;
:ne: &#8800;
:approx: &#8776;
:inf: &#8734;

Contributors include: Alon Amid, Krste Asanovic, Allen Baum, Alex
Bradbury, Tony Brewer, Chris Celio, Aliaksei Chapyzhenka, Silviu
Chiricescu, Ken Dockser, Bob Dreyer, Roger Espasa, Sean Halle, John
Hauser, David Horner, Bruce Hoult, Bill Huffman, Nicholas Knight,
Constantine Korikov,
Ben Korpan, Hanna Kruppe, Yunsup Lee, Guy Lemieux, Grigorios Magklis,
Filip Moc, Rich Newell, Albert Ou, David Patterson, Colin Schmidt,
Alex Solomatnikov, Steve Wallach, Andrew Waterman, Jim Wilson.

:sectnums!:


###########################
v1.0-rc1からの変更点
###########################


****************************
現在まで変更なし。
****************************

:sectnums:


############################
イントロダクション
############################

..
  This document is a draft of the second release candidate for version
  1.0 of the RISC-V vector extension for public review.


このドキュメントは、RISC-Vベクトル拡張のバージョン1.0の第2リリース候補のドラフトであり、
パブリックレビューのためのものです。

*これはパブリックレビュー用の1.0の凍結版ではありません。*

..
  NOTE: When finally approved and the release candidate tag is removed,
  version 1.0 is intended to be sent out for public review as part of
  the RISC-V International ratification process.  Version 1.0 is also
  considered stable enough to begin developing toolchains, functional
  simulators, and initial implementations, including in upstream
  software projects, and is not expected to have major functionality
  changes except if serious issues are discovered during ratification.
  Once ratified, the spec will be given version 2.0.


.. note::

  パブリックレビューのために送り出されることを意図しています。
  また、バージョン1.0は、上流のソフトウェアプロジェクトを含め、ツールチェイン、機能シミュレータ、
  初期実装の開発を開始するのに十分な安定性があると考えられており、
  批准中に重大な問題が発見された場合を除き、
  大きな機能変更はないと予想されています。批准されると、仕様書のバージョンは2.0になります。
  
..
  This draft spec includes the complete set of currently defined vector
  instructions.  Section :ref:`sec-vector-extensions`  lists the standard
  vector extensions and which instructions and element widths are
  supported by each extension.


本仕様書ドラフトには、現在定義されているベクトル命令の全セットが含まれています。
:ref:`sec-vector-extensions`   セクションには、標準的なベクトル拡張機能と、
それぞれの拡張機能でサポートされる命令および要素幅が記載されています。


####################################################
実装により定義される定数パラメータ
####################################################


ベクトル拡張をサポートする各hartには、2つのパラメータが定義されています。

..
  * The maximum size of a vector element that any operation can produce or consume in bits, *ELEN* {ge} 8, which
  must be a power of 2.
  * The number of bits in a single vector register, *VLEN*, which must
  be a power of 2 and must be no greater than 2^16^.


* 任意の演算で生成または消費できるベクトル要素の最大サイズ(ビット) *ELEN* {ge} 8 で、これは2の累乗でなければなりません。
* 1つのベクトルレジスタのビット数 *VLEN* 、これは2の累乗でなければならず、2の累乗であり、2^16^以上であってはなりません。

..
  Standard vector extensions (Section :ref:`sec-vector-extensions` ) and
  architecture profiles may set further constraints on *ELEN* and *VLEN*.


標準的なベクトル拡張(:ref:`sec-vector-extensions`  セクション)とアーキテクチャプロファイルは、 *ELEN* と *VLEN* にさらなる制約を設定することができます。

..
  NOTE: The upper limit on VLEN allows software to know that indices
  will fit into 16 bits (largest VLMAX of 65,536 occurs for LMUL=8 and
  SEW=8 with VLEN=65,536).  Any future extension beyond 64Kib per vector
  register will require new configuration instructions such that
  software using the old configuration instructions does not see greater
  vector lengths.


.. note::

  (LMUL=8およびSEW=8でVLEN=65,536の場合、最大のVLMAXは65,536になります)。
  将来的にクトルレジスタあたり64Kib以上に拡張する場合は、
  新しいコンフィグレーション命令が必要になりますが、
  従来のコンフィグレーション命令を使用しているソフトウェアでは、
  ベクトルの長さが大きくなることはありません。
  
..
  The ISA supports writing binary code that under certain constraints
  will execute portably on harts with different values for the VLEN
  parameter, provided both support the required element types.


ISAは、特定の制約の下で、VLENパラメータの値が異なるhart上でバイナリコードが実行されるポータブルな記述をサポートしていますが、
両方のhartが必要な要素型をサポートしていることが条件となります。

..
  NOTE: Code can be written that will expose differences in
  implementation parameters.


.. note::

  
..
  NOTE: In general, thread contexts with active vector state cannot be
  migrated during execution between harts that have any difference in
  VLEN or ELEN parameters.


.. note::

  VLENまたはELENパラメータに違いのあるhart間で実行中にコンテキストを移動するとはできません。
  

####################################################
ベクトル拡張のプログラミングモデル
####################################################

..
  The vector extension adds 32 vector registers, and seven unprivileged
  CSRs (`vstart`, `vxsat`, `vxrm`, `vcsr`, `vtype`, `vl`, `vlenb`) to a
  base scalar RISC-V ISA.


ベクトル拡張はRISC-VのベーススカラISAに対して、
32本のベクトルレジスタと7つの非特権CSR(`vstart`, `vxsat`, `vxrm`, `vcsr`, `vl`, `vlenb`)が追加されます。

..
  .New vector CSRs
  [cols="2,2,2,10"]
  [%autowidth]
  |===
  | Address | Privilege | Name   | Description
  
  | 0x008 | URW | vstart | Vector start position
  | 0x009 | URW | vxsat  | Fixed-Point Saturate Flag
  | 0x00A | URW | vxrm   | Fixed-Point Rounding Mode
  | 0x00F | URW | vcsr   | Vector control and status register
  | 0xC20 | URO | vl     | Vector length
  | 0xC21 | URO | vtype  | Vector data type register
  | 0xC22 | URO | vlenb  | VLEN/8 (vector register length in bytes)
  |===


..


..
+----------+------+--------+-------------------------------------------+
| アドレス | 特権 | 名前   | 説明                                      |
+==========+======+========+===========================================+
| 0x008    | URW  | vstart | ベクトル操作開始位置                      |
+----------+------+--------+-------------------------------------------+
| 0x009    | URW  | vxsat  | 固定小数点飽和フラグ                      |
+----------+------+--------+-------------------------------------------+
| 0x00A    | URW  | vxrm   | 固定小数点丸めモード                      |
+----------+------+--------+-------------------------------------------+
| 0x00F    | URW  | vcsr   | ベクトル制御・ステータスレジスタ          |
+----------+------+--------+-------------------------------------------+
| 0xC20    | URO  | vl     | ベクトル長                                |
+----------+------+--------+-------------------------------------------+
| 0xC21    | URO  | vtype  | ベクトルデータ型レジスタ                  |
+----------+------+--------+-------------------------------------------+
| 0xC22    | URO  | vlenb  | VLEN/8 (バイト単位でのベクトルレジスタ長) |
+----------+------+--------+-------------------------------------------+


*************************
ベクトルレジスタ
*************************

..
  The vector extension adds 32 architectural vector registers,
  `v0`-`v31` to the base scalar RISC-V ISA.


ベクトル拡張により、RISC-VベーススカラISAに対して
32本のアーキテクチャベクトルレジスタ `v0`-`v31` が追加されます。


各ベクトルレジスタのサイズは固定長でVLENビットです。

..
  NOTE: Zfinx ("F in X") is a new ISA option under consideration where
  floating-point instructions take their arguments from the integer
  register file.  The 1.0 vector extension is also compatible with Zfinx.


.. note::

  整数レジスタファイル上で実行する新しいISAオプションです。
  Vector 1.0拡張はZfinxに対して互換性があります。
  

**************************************************************
`mstatus` 内のベクトルコンテキストステータス
**************************************************************

..
  A vector context status field, `VS`, is added to `mstatus[10:9]` and shadowed
  in `sstatus[10:9]`.  It is defined analogously to the floating-point context
  status field, `FS`.


ベクトルコンテキストステータスフィールド `VS` は、
`mstatus[10:9]` に追加され、
`sstatus[10:9]` でシャドウイングされます。
これは、浮動小数点コンテキストステータスフィールドである `FS` と同様に定義されます。

..
  Attempts to execute any vector instruction, or to access the vector
  CSRs, raise an illegal-instruction exception when the `VS` field is
  set to Off.


`VS` フィールドがOFFに設定されている場合は、
ベクトル命令を実行しようとしたり、ベクトルCSRにアクセスしようとすると、
不正命令例外が発生します。

..
  When the `VS` field is set to Initial or Clean, executing any
  instruction that changes vector state, including the vector CSRs, will
  change `VS` to Dirty.
  Implementations may also change the `VS` field from Initial or Clean to Dirty
  at any time, even when there is no change in vector state.


`VS` フィールドがInitialまたはCleanに設定されている場合、
ベクトルCSRを含むベクトルの状態を変更する命令を実行すると、`VS` がDirtyに変更されます。
また、実装では、ベクトルの状態が変化していなくても、
いつでも `VS` フィールドをInitialまたはCleanからDirtyに変更することができます。

..
  NOTE: Accurate setting of the `VS` field is an optimization.  Software
  will typically use VS to reduce context swap overhead.


.. note::

  ソフトウェアは通常、コンテキストスワップのオーバーヘッドを減らすためにVSを使用します。
  
..
  Implementations may have a writable `misa.v` field.  Analogous to the
  way in which the floating-point unit is handled, the `mstatus.vs`
  field may exist even if `misa.v` is clear.


実装では、書き込み可能な `misa.v` フィールドを持つことができます。
浮動小数点演算ユニットの処理方法と同様に、
`misa.v` がクリアされていても `mstatus.vs` フィールドが存在する場合があります。


..
  NOTE: Allowing `mstatus.vs` to exist when `misa.v` is clear, enables
  vector emulation and simplifies handling of `mstatus.vs` in systems
  with writable `misa.v`.


.. note::

  ベクトル操作のエミュレーションが可能になり、書き込み可能な `misa.v` を持つシステムでの `mstatus.vs` の処理が簡単になります。
  

************************************
ベクトル型レジスタ `vtype`
************************************

..
  The read-only XLEN-wide *vector* *type* CSR, `vtype` provides the
  default type used to interpret the contents of the vector register
  file, and can only be updated by `vset{i}vl{i}` instructions. The vector
  type also determines the organization of elements in each vector
  register, and how multiple vector registers are grouped.


読み取り専用のXLEN幅を持つ *ベクトル* *型* CSR (`vtype`)は、
ベクトルレジスタファイルの内容を解釈するために使用されるデフォルトの型を提供し、
`vset{i}vl{i}` 命令によってのみ更新することができます。
ベクトル型は、各ベクトルレジスタの要素の構成や、複数のベクトルレジスタをどのようにグループ化するかを決定します。

..
  NOTE: Allowing updates only via the `vset{i}vl{i}` instructions
  simplifies maintenance of the `vtype` register state.


.. note::

  `vtype` レジスタの状態の維持が容易になります。
  
..
  The `vtype` register has five fields,
  `vill`, `vma`, `vta`, `vsew[2:0]`, and `vlmul[2:0]`.


`vtype` レジスタには、 `vill` 、 `vma` 、 `vta` 、 `vsew[2:0]` 、 `vlmul[2:0]` の5つのフィールドがあります。

include::vtype-format.adoc[]

..
  NOTE: A small implementation supporting ELEN=32 requires only seven
  bits of state in `vtype`: two bits for `ma` and `ta`, two bits for
  `vsew[1:0]` and three bits for `vlmul[2:0]`.  The illegal value
  represented by `vill` can be internally encoded using the illegal 64-bit
  combination in `vsew[1:0]` without requiring an additional storage
  bit to hold `vill`.


.. note::

  `ma` と `ta` に 2 ビット、`vsew[1:0]` に 2 ビット、
  `vlmul[2:0]` に 3 ビットです。
  `vill` で表される不正な値は、`vsew[1:0]` の不正な64ビットの組み合わせを使って内部的にエンコードすることができ、
  `vill` を保持するための追加のストレージビットを必要としません。
  
..
  NOTE: Further standard and custom vector extensions will extend these
  fields to support a greater variety of data types.


.. note::

  より多様なデータ型がサポートされるようになります。
  
..
  NOTE: It is anticipated that an extended 64-bit instruction encoding
  would allow these fields to be specified statically in the instruction
  encoding.


.. note::

  これらのフィールドを命令エンコーディング内で静的に指定できるようになることが予想されます。
  

========================================
ベクトル選択要素幅 `vsew[2:0]`
========================================

..
  The value in `vsew` sets the dynamic *selected* *element* *width*
  (SEW).  By default, a vector register is viewed as being divided into
  VLEN/SEW elements.


`vsew` の値は、動的な *選択要素幅* (SEW)を設定します。
デフォルトでは、ベクトルレジスタは、
VLEN/SEW要素に分割されているとみなされます。

..
  .vsew[2:0] (selected element width) encoding
  [cols="1,1,1,1,>13"]
  [%autowidth]
  |===
  3+| vsew[2:0] | SEW |
  
  | 0 | 0 | 0 |    8 |
  | 0 | 0 | 1 |   16 |
  | 0 | 1 | 0 |   32 |
  | 0 | 1 | 1 |   64 |
  | 1 | 0 | 0 |  128 | *Reserved*
  | 1 | 0 | 1 |  256 | *Reserved*
  | 1 | 1 | 0 |  512 | *Reserved*
  | 1 | 1 | 1 | 1024 | *Reserved*
  |===


..


..
+-----------+------+------+
| vsew[2:0] | SEW  |      |
+===+===+===+======+======+
| 0 | 0 | 0 | 8    |      |
+---+---+---+------+------+
| 0 | 0 | 1 | 16   |      |
+---+---+---+------+------+
| 0 | 1 | 0 | 32   |      |
+---+---+---+------+------+
| 0 | 1 | 1 | 64   |      |
+---+---+---+------+------+
| 1 | 0 | 0 | 128  | 予約 |
+---+---+---+------+------+
| 1 | 0 | 1 | 256  | 予約 |
+---+---+---+------+------+
| 1 | 1 | 0 | 512  | 予約 |
+---+---+---+------+------+
| 1 | 1 | 1 | 1024 | 予約 |
+---+---+---+------+------+


..
  NOTE: While it is anticipated the larger `vsew[2:0]` encodings
  (`100`-`111`) will be used to encode larger SEW as shown in table, the
  encodings are formally *reserved* at this point.


.. note::

  より大きな `vsew[2:0]` エンコーディング (`100`-`111`) が使用されることが予想されますが、
  このエンコーディングは、現時点では正式には *予約* です。
  
..
  .Example VLEN = 128 bits
  [cols=">,>"]
  [%autowidth]
  |===
  | SEW | Elements per vector register
  
  | 64 |  2
  | 32 |  4
  | 16 |  8
  |  8 | 16
  |===


..


..
+-----+--------------------------------+
| SEW | ベクトルレジスタあたりの要素数 |
+-----+--------------------------------+
| 64  | 2                              |
+-----+--------------------------------+
| 32  | 4                              |
+-----+--------------------------------+
| 16  | 8                              |
+-----+--------------------------------+
| 8   | 16                             |
+-----+--------------------------------+


..
  The supported element width may vary with LMUL, but profiles may
  mandate the minimum SEW that must be supported with LMUL=1.


サポートされる要素の幅はLMULによって異なりますが、
プロファイルはLMUL=1でサポートされなければならない最小のSEWを義務付けている場合があります。

..
  NOTE: Some implementations may support larger SEWs only when bits from
  multiple vector registers are combined.  Software that relies on large
  SEW should attempt to use the largest LMUL, and hence the fewest
  vector register groups, to increase the number of implementations on
  which the code will run. The `vill` bit in `vtype` should be checked
  after setting `vtype` to see if the configuration is supported, and an
  alternate code path should be provided if it is not. Alternatively, a
  profile can mandate the minimum SEW at each LMUL setting.


.. note::

  大きなSEWをサポートする場合があります。
  大きなSEWに依存しているソフトウェアは、コードを実行できる実装の数を増やすために、
  最大のLMULを使用し、したがって最小のベクトルレジスタグループを使用するようにしてください。
  また、`vtype` を設定した後に、`vtype` の `vill` ビットをチェックして、
  その構成がサポートされているかどうかを確認し、
  サポートされていない場合には、代替のコードパスを提供する必要があります。
  また、プロファイルでは、各LMULの設定で最小のSEWを義務付けることもできます。
  

=========================================================
ベクトルレジスタのグループ化(`vlmul[2:0]`)
=========================================================

..
  Multiple vector registers can be grouped together, so that a single
  vector instruction can operate on multiple vector registers.  The term
  *vector* *register* *group* is used herein to refer to one or more
  vector registers used as a single operand to a vector instruction.
  Vector register groups allow double-width or larger elements to be
  operated on with the same vector length as selected-width elements.
  Vector register groups also provide greater execution efficiency for
  longer application vectors.


複数のベクトルレジスタをグループ化することで、1つのベクトル命令で複数のベクトルレジスタを操作することができます。
本仕様書では、ベクトル命令の単一オペランドとして使用される1つまたは複数のベクトルレジスタを指すために *ベクトルレジスタグループ* という用語を使用しています。
ベクトルレジスタグループは、2倍以上の幅の要素を、選択された幅の要素と同じベクトル長で操作することを可能にします。
また、ベクトルレジスタグループは、長いアプリケーションベクトルの実行効率を高めます。

..
  The vector length multiplier, *LMUL*, when greater than 1, represents
  the default number of vector registers that are combined to form a
  vector register group.  Implementations must support LMUL integer values of 1,2,4,8.


ベクトル長 の倍数 *LMUL* が1より大きい場合は、ベクトルレジスタグループ形成するために
結合されるベクトルレジスタのデフォルト数を表します。
実装では、LMULは整数値1,2,4,8をサポートする必要があります。

..
  LMUL can also be a fractional value, reducing the number of bits used
  in a vector register.  LMUL can have fractional values 1/2, 1/4, 1/8.
  Fractional LMUL is used to increase the number of usable architectural
  registers when operating on mixed-width values, by not requiring that
  larger-width vectors occupy multiple vector registers. Instead, wider
  values can occupy a single vector register and narrower values can
  occupy a fraction of a vector register.


LMUL は、ベクトルレジスタで使用されるビット数を減らすために、小数値を取ることもできます。
LMUL は、1/2、1/4、1/8 の分数値を持つことができます。
小数点以下のLMULは、幅の広いベクトルが複数のベクトルレジスタを使用する必要がないため、
幅の異なる値を操作する際に使用可能なアーキテクチャーレジスタの数を増やすために使用されます。
その代わり、幅の広い値は1つのベクトルレジスタを占有し、
幅の狭い値はベクトルレジスタの端数を占有することができます。

..
  Implementations must support fractional LMUL settings for LMUL {ge}
  SEW~LMUL1MIN~/SEW~LMUL1MAX~, where SEW~LMUL1MIN~ is the narrowest
  supported SEW value at LMUL=1 and SEW~LMUL1MAX~ is the widest
  supported SEW value at LMUL=1.  An attempt to set an unsupported SEW
  and LMUL configuration sets the `vill` bit in `vtype`.


実装では、LMUL {ge}  SEW~LMUL1MIN~/SEW~LMUL1MAX~ の小数の LMUL 設定をサポートする必要があります。
SEW~LMUL1MIN~ はLMUL=1でサポートされる最も狭いSEW値で、
SEW~LMUL1MAX~ はLMUL=1でサポートされる最も広いSEW値です。
サポートされていないSEWとLMULの設定を行おうとすると、 `vtype` の `vill` ビットが設定されます。

..
  For a given supported fractional LMUL setting, implementations must support
  SEW settings between SEW~LMUL1MIN~ and LMUL * SEW~LMUL1MAX~, inclusive.


サポートされている小数のLMUL設定に対して、
実装はSEW~LMUL1MIN~ とLMUL * SEW~LMUL1MAX~ の間のSEW設定をサポートしなければなりません。

..
  NOTE: Requiring LMUL {ge} SEW~LMUL1MIN~/SEW~LMUL1MAX~ allows software
  operating on mixed-width elements to only use a single vector register
  to hold the wider elements, with fractional
  LMUL used to hold narrower elements.  When LMUL <
  SEW~LMUL1MIN~/SEW~LMUL1MAX~, there is no guarantee an implementation
  would have enough bits in the fractional vector register to store at
  least one element, as VLEN=SEW~LMUL1MAX~ is a valid implementation
  choice.


.. note::

  幅の広い要素を保持するために1つのベクトルレジスタのみを使用し、幅の狭い要素を保持するために小数のLMULを使用することができます。
  LMUL < SEW~LMUL1MIN~/SEW~LMUL1MAX~ の場合、VLEN=SEW~LMUL1MAX~ が有効な実装選択であるため、
  少なくとも1つの要素を格納するのに十分なビットが小数ベクトル・レジスタにあるという保証はありません。
  
..
  NOTE: The constraint is written using SEW~LMUL1MAX~ and not ELEN
  because some systems might only support larger SEW values for LMUL>1.
  Note that in these cases, the constraint ensures that no more than a
  single vector register is needed to hold the widest-supported element
  that can be held in a single vector register, when code is also
  performing operations on narrower widths.


.. note::

  この制約はELENではなくSEW~LMUL1MAX~ を使用して記述されています。
  このような場合には、コードがより狭い幅の演算も実行しているときに、単一のベクトルレジスタに保持できる最も広くサポートされた要素を保持するために、
  単一のベクトルレジスタ以上のものが必要とされないように、制約が保証されることに注意してください。
  
..
  The use of `vtype` encodings with LMUL < SEW~LMUL1MIN~/SEW~LMUL1MAX~ is
  **reserved**, but implementations can set `vill` if they do not
  support these configurations.


LMUL < SEW~LMUL1MIN~/SEW~LMUL1MAX~ での `vtype` エンコーディングの使用は**予約**ですが、
実装はこれらの構成をサポートしていない場合、 `vill` を設定することができます。

..
  NOTE: Requiring all implementations to set `vill` in this case would
  prohibit future use of this case in an extension, so to allow for
  a future definition of LMUL<SEW~LMUL1MIN~/SEW~LMUL1MAX~ behavior, we consider
  the use of this case to be **reserved**.


.. note::

  将来的に拡張機能でこのケースを使用することができなくなるため、
  LMUL<SEW~LMUL1MIN~/SEW~LMUL1MAX~ の動作を将来的に定義できるように、
  このケースの使用は **予約** であると考えます。
  
..
  NOTE: It is recommended that assemblers provide a warning (not an
  error) if a `vsetvli` instruction attempts to write an LMUL < SEW~LMUL1MIN~/SEW~LMUL1MAX~.


.. note::

  エラーではなく警告を出すことを推奨します。
  
..
  LMUL is set by the signed `vlmul` field in `vtype` (LMUL =
  2^`vlmul[2:0]`^).


LMULは、`vtype` の符号付き `vlmul` フィールドで設定されます (LMUL = 2^`vlmul[2:0]`^)。

..
  The derived value VLMAX = LMUL*VLEN/SEW represents the maximum number
  of elements that can be operated on with a single vector instruction
  given the current SEW and LMUL settings as shown in the table below.


VLMAX = LMUL*VLEN/SEW の導出値は、以下の表に示すように、
現在のSEWとLMULの設定があれば、1つのベクト命令で操作できる要素の最大数を表します。

..


..
+------------+------+---------+------------+-----------------------------------+
| vlmul[2:0] | LMUL | #groups | VLMAX      | Registers grouped with register n |
+====+===+===+======+=========+============+===================================+
| 1  | 0 | 0 | -    | -       | -          | reserved                          |
+----+---+---+------+---------+------------+-----------------------------------+
| 1  | 0 | 1 | 1/8  | 32      | VLEN/SEW/8 | v n (single register in group)    |
+----+---+---+------+---------+------------+-----------------------------------+
| 1  | 1 | 0 | 1/4  | 32      | VLEN/SEW/4 | v n (single register in group)    |
+----+---+---+------+---------+------------+-----------------------------------+
| 1  | 1 | 1 | 1/2  | 32      | VLEN/SEW/2 | v n (single register in group)    |
+----+---+---+------+---------+------------+-----------------------------------+
| 0  | 0 | 0 | 1    | 32      | VLEN/SEW   | v n (single register in group)    |
+----+---+---+------+---------+------------+-----------------------------------+
| 0  | 0 | 1 | 2    | 16      | 2*VLEN/SEW | v n, v n+1                        |
+----+---+---+------+---------+------------+-----------------------------------+
| 0  | 1 | 0 | 4    | 8       | 4*VLEN/SEW | v n, …, v n+3                     |
+----+---+---+------+---------+------------+-----------------------------------+
| 0  | 1 | 1 | 8    | 4       | 8*VLEN/SEW | v n, …, v n+7                     |
+----+---+---+------+---------+------------+-----------------------------------+


..
  When LMUL=2, the vector register group contains vector register `v`
  **n** and vector register `v` **n**+1, providing twice the vector
  length in bits.  Instructions specifying an LMUL=2 vector register group
  with an odd-numbered vector register are reserved.


LMUL=2の場合、ベクトレジスタグループには、ベクトレジスタ `v` **n**とベクトレジスタ `v` **n**+1が含まれ、
ビット単位で2倍のベクト長になります。
LMUL=2のベクトレジスタグループで、奇数番号のベクトレジスタを指定する命令は予約されています。

..
  When LMUL=4, the vector register group contains four vector registers,
  and instructions specifying an LMUL=4 vector register group using vector
  register numbers that are not multiples of four are reserved.


LMUL=4 の場合、ベクトレジスタグループには 4 個のベクトレジスタが含まれ、
4 の倍数ではないベクトレジスタ番号を使用して
LMUL=4 のベクトレジスタグループを指定する命令は予約されます。

..
  When LMUL=8, the vector register group contains eight vector
  registers, and instructions specifying an LMUL=8 vector register group
  using register numbers that are not multiples of eight are reserved.


LMUL=8 の場合、ベクトレジスタグループには 8 個のベクトレジスタが含まれ、
8 の倍数ではないレジスタ番号を使用して LMUL=8 ベクトレジスタグループを指定する命令は予約されています。

..
  Mask registers are always contained in a single vector register,
  regardless of LMUL.


マスクレジスタは、LMULにかかわらず、常に1つのベクトレジスタとして取り扱われます。

.. _sec-agnostic:


======================================================
Tail Agnostic とVector Mask Agnostic `vta` と `vma`
======================================================

..
  These two bits modify the behavior of destination tail elements and
  destination inactive masked-off elements respectively during the
  execution of vector instructions.  The tail and inactive sets contain
  element positions that are not receiving new results during a vector
  operation, as defined in Section :ref:`sec-inactive-defs` .


これらの2つのビットは、ベクトル命令の実行中に、書き込みレジスタのTail要素および
書き込みレジスタの非アクティブなマスクオフされた要素の動作をそれぞれ変更します。
Tailセットと非アクティブセットには、:ref:`sec-inactive-defs`  項で定義されているように、
ベクトル演算中に新しい結果を受け取らない要素の位置を含んでいます。


すべてのシステムは、4つのオプションすべてをサポートしなければなりません。

..
  [cols="1,1,3,3"]
  [%autowidth]
  |===
  | `vta` | `vma` | Tail Elements | Inactive Elements
  
  |   0   |   0   | undisturbed   | undisturbed
  |   0   |   1   | undisturbed   | agnostic
  |   1   |   0   | agnostic      | undisturbed
  |   1   |   1   | agnostic      | agnostic
  |===


..


..
+-----+-----+-------------+------------------+
| vta | vma | 末尾要素    | 非アクティブ要素 |
+=====+=====+=============+==================+
| 0   | 0   | undisturbed | undisturbed      |
+-----+-----+-------------+------------------+
| 0   | 1   | undisturbed | agnostic         |
+-----+-----+-------------+------------------+
| 1   | 0   | agnostic    | undisturbed      |
+-----+-----+-------------+------------------+
| 1   | 1   | agnostic    | agnostic         |
+-----+-----+-------------+------------------+


..
  When a set is marked undisturbed, the corresponding set of destination
  elements in a vector register group retain the value they previously
  held.   Mask destination values are always treated as tail-agnostic,
  regardless of the setting of `vta`.


セットがundisturbedに設定されている場合、ベクトルレジスタグループ内の対応するセットの書き込み要素は、
以前の値を保持します。
マスクの書き込み値は、`vta` の設定にかかわらず、
常にTail Agnosticとして扱われます。

..
  When a set is marked agnostic, the corresponding set of destination
  elements in any vector destination operand can either retain the value
  they previously held, or are overwritten with 1s.  Within a single vector
  instruction, each destination element can be either left undisturbed
  or overwritten with 1s, in any combination, and the pattern of
  undisturbed or overwritten with 1s is not required to be deterministic
  when the instruction is executed with the same inputs.  In addition,
  except for mask load instructions, any element in the tail of a mask
  result can also be written with the value the mask-producing operation
  would have calculated with `vl`=VLMAX.


セットがAgnosticとして設定されている場合、任意のベクトル書き込みオペランドの要素における対応するセットは、
以前保持していた値か、1で上書きされるかのいずれかになります。
1つのベクトル命令の中で、各書き込み要素は、任意の組み合わせで、値を保持したり、1で上書きしたりすることができ、
同じ入力で命令を実行したときに、値を保持するか1で上書きしたりするパタンは、常に決定している必要はありません。
また、マスクロード命令を除き、マスク結果の末尾にある任意の要素には、
マスク生成演算が `vl`=VLMAX で計算したであろう値を書き込むこともできます。

..
  NOTE: The agnostic policy was added to accommodate machines with vector
  register renaming, and/or that have deeply temporal vector registers.
  With an undisturbed policy, all elements would have to be read from
  the old physical destination vector register to be copied into the new
  physical destination vector register.  This causes an inefficiency
  when these inactive or tail values are not required for subsequent
  calculations.


.. note::

  深い一時的なベクトルレジスタを持つマシンに対応するために追加されました。
  Agnosticなポリシでは、新しい物理書き込みベクトルレジスタにコピーするために、
  すべての要素を古い物理書き込みベクトルレジスタから読み込まなければなりません。
  これは、これらの非アクティブまたは末尾の要素値が後続の計算に必要でない場合、効率が悪くなります。
  
..
  NOTE: Mask tails are always treated as agnostic to reduce complexity
  of managing mask data, which can be written at bit granularity.  There
  appears to be little software need to support tail-undisturbed for
  mask register values.  Allowing mask-generating instructions to write
  back the result of the instruction avoids the need for logic to mask
  out the tail, except mask loads cannot write memory values to
  destination mask tails as this would imply accessing memory past
  software intent.


.. note::

  マスクレジスタ値のTail Undisturbedをサポートするソフトウェアの必要性はほとんどないと思われます。
  マスクを生成する命令が命令の結果を書き戻すことを許可すると、末尾をマスクアウトするロジックの必要性がなくなります。
  ただし、マスクロードは、ソフトウェアの意図を超えてメモリにアクセスすることになるため、
  宛先のマスクテールにメモリ値を書き込むことはできません。
  
..
  NOTE: The value of all 1s instead of all 0s was chosen for the
  overwrite value to discourage software developers from depending on
  the value written.


.. note::

  ソフトウェア開発者が書き込まれた値に依存しないようにするためです。
  
..
  NOTE: A simple in-order implementation can ignore the settings and
  simply execute all vector instructions using the undisturbed
  policy. The `vta` and `vma` state bits must still be provided in
  `vtype` for compatibility and to support thread migration.


.. note::

  互換性とスレッドの移行をサポートするために、`vta` と `vma` のステートビットは `vtype` で提供されなければなりません。
  
..
  NOTE: An out-of-order implementation can choose to implement
  tail-agnostic + mask-agnostic using tail-agnostic + mask-undisturbed
  to reduce implementation complexity.


.. note::

  tail-agnostic + mask-agnostic を tail-agnostic + mask-undisturbed を使って実装することを選択できます。
  
..
  NOTE: The definition of agnostic result policy is left loose to
  accommodate migrating application threads between harts on a small
  in-order core (which probably leaves agnostic regions undisturbed) and
  harts on a larger out-of-order core with register renaming (which
  probably overwrites agnostic elements with 1s).  As it might be
  necessary to restart in the middle, we allow arbitrary mixing of
  agnostic policies within a single vector instruction.  This allowed
  mixing of policies also enables implementations that might change
  policies for different granules of a vector register, for example,
  using undisturbed within a granule that is actively operated on but
  renaming to all 1s for granules in the tail.


.. note::

  レジスタリネーミングのある大きなアウトオブオーダコア上のhart(おそらくagnostic要素を1で上書きする)の間でアプリケーションスレッドを
  移行することに対応するため、緩く残されています。
  途中で再起動する必要があるかもしれないので、1つのベクトル命令の中で、Agnosticなポリシを任意に混在させることができます。
  このようなポリシの混在を許容することで、
  例えば、アクティブに操作されている要素内ではundisturbedを使用し、テールの要素ではすべて1にリネームするなど、
  ベクトルレジスタの異なる要素に対してポリシを変更するような実装も可能になります。
  
..
  The assembly syntax adds two flags to the `vsetvli` instruction:


アセンブリ構文では、 `vsetvli` 命令に2つのフラグが追加されています。

::

   ta   # Tail agnostic
   tu   # Tail undisturbed
   ma   # Mask agnostic
   mu   # Mask undisturbed
  
   vsetvli t0, a0, e32, m4, ta, ma   # Tail agnostic, mask agnostic
   vsetvli t0, a0, e32, m4, tu, ma   # Tail undisturbed, mask agnostic
   vsetvli t0, a0, e32, m4, ta, mu   # Tail agnostic, mask undisturbed
   vsetvli t0, a0, e32, m4, tu, mu   # Tail undisturbed, mask undisturbed
  

..
  NOTE: To maintain backward compatibility in the short term and reduce
  software churn in the move to 0.9, when these flags are not specified
  on a `vsetvli`, they should default to
  mask-undisturbed/tail-undisturbed.  The use of `vsetvli` without these
  flags should be deprecated, however, such that the specifying a flag
  setting becomes mandatory.  If anything, the default should be
  tail-agnostic/mask-agnostic, so software has to specify when it cares
  about the non-participating elements, but given the historical meaning
  of the instruction prior to introduction of these flags, it is safest
  to always require them in future assembly code.


.. note::

  デフォルトで mask-undisturbed/tail-undisturbed とすべきです。
  しかし、これらのフラグを持たない `vsetvli` の使用は非推奨とし、
  フラグ設定の指定が必須となるようにします。
  どちらかというと、デフォルトはtail-agnostic/mask-agnosticにすべきなので、
  ソフトウェアは非アクティブ要素を気にするタイミングを指定する必要がありますが、
  これらのフラグが導入される前の命令の歴史的な意味を考えると、
  将来のアセンブリコードでは常にフラグを要求するのが最も安全です。
  

=============================
不正ベクトル型 `vill`
=============================

..
  The `vill` bit is used to encode that a previous `vset{i}vl{i}`
  instruction attempted to write an unsupported value to `vtype`.


`vill` ビットは、過去の `vset{i}vl{i}` 命令がサポートされていない値を `vtype` に書き込もうとしたことをエンコードするために使用されます。

..
  NOTE: The `vill` bit is held in bit XLEN-1 of the CSR to support
  checking for illegal values with a branch on the sign bit.


.. note::

  符号を検査する分岐命令で不正な値をチェックできます。
  
..
  If the `vill` bit is set, then any attempt to execute a vector instruction
  that depends upon `vtype` will raise an illegal-instruction exception.


`vill` ビットがセットされている場合、`vtype` に依存するベクトル命令を実行しようとすると、
不正命令例外が発生します。

..
  NOTE: `vset{i}vl{i}` and whole-register loads, stores, and moves do not depend
  upon `vtype`.


.. note::

  
..
  When the `vill` bit is set, the other XLEN-1 bits in `vtype` shall be
  zero.


`vill` ビットがセットされているときは、
`vtype` の他のXLEN-1ビットはゼロでなければなりません。


*********************************
ベクトル長レジスタ `vl`
*********************************

..
  The *XLEN*-bit-wide read-only `vl` CSR can only be updated by the
  `vset{i}vl{i}` instructions, and the *fault-only-first* vector load
  instruction variants.


*XLEN* ビット幅の読み込み専用 `vl` CSRは、`vset{i}vl{i}` 命令と、
*fault-only-first* ベクトルロード命令によってのみ更新することができます。

..
  The `vl` register holds an unsigned integer specifying the number of
  elements to be updated with results from a vector instruction, as
  further detailed in Section :ref:`sec-inactive-defs` .


`vl` レジスタは、セクション :ref:`sec-inactive-defs`  で詳しく説明されているように、
ベクトル命令の結果で更新される要素数を指定する符号なし整数を保持します。

..
  NOTE: The number of bits implemented in `vl` depends on the
  implementation's maximum vector length of the smallest supported
  type. The smallest vector implementation with VLEN=32 and supporting
  SEW=8 would need at least six bits in `vl` to hold the values 0-32
  (VLEN=32, with LMUL=8 and SEW=8, yields VLMAX=32).


.. note::

  VLEN=32 で SEW=8 をサポートする最小のベクトル実装では、0～32 の値を保持するために、
  `vl` に少なくとも 6 ビットが必要になります (VLEN=32 で LMUL=8 と SEW=8 を使用すると、VLMAX=32 になります)。
  

*********************************
ベクトルバイト長 `vlenb`
*********************************

..
  The *XLEN*-bit-wide read-only CSR `vlenb` holds the value VLEN/8,
  i.e., the vector register length in bytes.


*XLEN* ビット幅の読み取り専用CSR `vlenb` は、値VLEN/8、すなわち、
バイト単位でのベクトルレジスタの長さを保持します。

..
  NOTE: The value in `vlenb` is a design-time constant in any
  implementation.


.. note::

  
..
  NOTE: Without this CSR, several instructions are needed to calculate
  VLEN in bytes, and the code has to disturb current `vl` and `vtype`
  settings which require them to be saved and restored.


.. note::

  コードは現在の `vl` と `vtype` の設定を乱す必要があり、
  それらを保存したり復元したりする必要があります。
  

*******************************************************
ベクトルスタートインデックスCSR `vstart`
*******************************************************

..
  The `vstart` read-write CSR specifies the index of the first element
  to be executed by a vector instruction, as described in Section
  :ref:`sec-inactive-defs` .


読み書き可能なCSRである `vstart` は、 :ref:`sec-inactive-defs`  項で説明したように、
ベクトル命令で実行される最初の要素のインデックスを指定します。

..
  Normally, `vstart` is only written by hardware on a trap on a vector
  instruction, with the `vstart` value representing the element on which
  the trap was taken (either a synchronous exception or an asynchronous
  interrupt), and at which execution should resume after a resumable
  trap is handled.


通常、`vstart` はベクトル命令の例外発生時にハードウェアによってのみ書き込まれ、
`vstart` の値は例外発生した要素 (同期例外または非同期割り込み) を表し、
再開可能な例外が処理された後に実行が再開されるべき要素を表します。

..
  All vector instructions are defined to begin execution with the
  element number given in the `vstart` CSR, leaving earlier elements in
  the destination vector undisturbed, and to reset the `vstart` CSR to
  zero at the end of execution.


すべてのベクトル命令は、 `vstart` CSR で指定された要素インデックスから実行を開始し、
書き込みベクトル内の以前の要素には影響を与えず、実行終了時には `vstart` CSR をゼロにリセットするように定義されています。

..
  NOTE: All vector instructions, including `vset{i}vl{i}`, reset the `vstart`
  CSR to zero.


.. note::

  
..
  `vstart` is not modified by vector instructions that raise illegal-instruction
  exceptions.


`vstart`  は、不正命令例外を発生させるベクトル命令によって変更されません。

..
  The `vstart` CSR is defined to have only enough writable bits to hold
  the largest element index (one less than the maximum VLMAX).


`vstart` CSR は、最大の要素インデックスを保持するのに十分な書き込み可能なビットのみを持つように定義されます (最大 VLMAX より 1 つ少ない)。

..
  NOTE: The maximum vector length is obtained with the largest LMUL
  setting (8) and the smallest SEW setting (8), so VLMAX*max = 8*VLEN/8
  = VLEN.  For example, for VLEN=256, `vstart` would have 8 bits to
  represent indices from 0 through 255.


.. note::

  例えば、VLEN=256の場合、`vstart` は、0から255までのインデックスを表す8ビットを持つことになります。
  
..
  The use of `vstart` values greater than the largest element index for
  the current SEW setting is reserved.


現在のSEW設定の最大要素インデックスよりも大きい `vstart` 値の使用は予約されています。

..
  NOTE: It is recommended that implementations trap if `vstart` is out
  of bounds.  It is not required to trap, as a possible future use of
  upper `vstart` bits is to store imprecise trap information.


.. note::

  上位の `vstart` ビットの将来的な使用法として、
  不正確な例外情報を保存することが考えられるため、
  例外を発生させることは必須ではありません。
  

..
  The `vstart` CSR is writable by unprivileged code, but non-zero
  `vstart` values may cause vector instructions to run substantially
  slower on some implementations, so `vstart` should not be used by
  application programmers.  A few vector instructions cannot be
  executed with a non-zero `vstart` value and will raise an illegal
  instruction exception as defined below.


`vstart` CSR は非特権コードによって書き込み可能ですが、
非ゼロの `vstart` 値はいくつかの実装でベクトル命令の実行速度を大幅に低下させる可能性があるため、
`vstart` はアプリケーション・プログラマーが使用すべきではありません。
いくつかのベクトル命令は、非ゼロの `vstart` 値では実行できず、
以下に定義するような不正な命令例外を発生させます。

..
  NOTE: Making `vstart` visible to unprivileged code supports user-level
  threading libraries.


.. note::

  ユーザレベルのスレッディングライブラリをサポートします。
  
..
  Implementations are permitted to raise illegal instruction exceptions when
  attempting to execute a vector instruction with a value of `vstart` that the
  implementation can never produce when executing that same instruction with
  the same `vtype` setting.


実装は、同じ `vtype` 設定で同じ命令を実行した場合でも、
実装が決して生成できない `vstart` の値を持つベクトル命令を実行しようとした場合、
不正命令例外を発生させることが許可されています。

..
  NOTE: For example, some implementations will never take interrupts during
  execution of a vector arithmetic instruction, instead waiting until the
  instruction completes to take the interrupt.  Such implementations are
  permitted to raise an illegal instruction exception when attempting to execute
  a vector arithmetic instruction when `vstart` is nonzero.


.. note::

  命令が完了するまで待って割り込みを受け取ります。
  そのような実装では、 `vstart` が0でないときにベクトル演算命令を実行しようとすると、
  不正な命令例外を発生させることができます。
  
..
  NOTE: When migrating a software thread between two harts with
  different microarchitectures, the `vstart` value might not be
  supported by the new hart microarchitecture.  The runtime on the
  receiving hart might then have to emulate instruction execution to a
  supported vstart element position.  Alternatively, migration events
  can be constrained to only occur at mutually supported `vstart`
  locations.


.. note::

  `vstart` 値は新しい hart のマイクロアーキテクチャーではサポートされていない可能性があります。
  その場合、受信側の hart のランタイムは、サポートされている vstart 要素の位置への命令実行をエミュレートしなければならないかもしれません。
  あるいは、移行イベントは、相互にサポートされる `vstart` 位置でのみ発生するように制約することもできます。
  

**************************************************************
ベクトル固定小数点丸めモードレジスタ `vxrm`
**************************************************************

..
  The vector fixed-point rounding-mode register holds a two-bit
  read-write rounding-mode field.  The vector fixed-point rounding-mode
  is given a separate CSR address to allow independent access, but is
  also reflected as a field in `vcsr`.


ベクトル固定小数点丸めモード・レジスタは、2ビットの読み書き可能な丸め込みモード・フィールドを保持します。
ベクトル固定小数点丸め込みモードは、独立してアクセスできるように、
別のCSRアドレスが与えられていますが、 `vcsr` のフィールドとしても反映されています。

..
  The fixed-point rounding algorithm is specified as follows.
  Suppose the pre-rounding result is `v`, and `d` bits of that result are to be
  rounded off.
  Then the rounded result is `(v >> d) + r`, where `r` depends on the rounding
  mode as specified in the following table.


固定小数点丸めのアルゴリズムは以下のように規定されています。
丸め前の結果を `v` とし、その結果の `d` ビットを丸めるとします。
このとき、丸められた結果は `(v >> d) + r` となり、 `r`  は次の表にあるように丸めモードに依存します。

..


..
+-----------+--------------+--------------------------------------------+------------------------------+
| vxrm[1:0] | Abbreviation | Rounding Mode                              | Rounding increment, r        |
+=====+=====+==============+============================================+==============================+
| 0   | 0   | rnu          | round-to-nearest-up (add +0.5 LSB)         | v[d-1]                       |
+-----+-----+--------------+--------------------------------------------+------------------------------+
| 0   | 1   | rne          | round-to-nearest-even                      | v[d-1] & (v[d-2:0]≠0 | v[d]) |
+-----+-----+--------------+--------------------------------------------+------------------------------+
| 1   | 0   | rdn          | round-down (truncate)                      | 0                            |
+-----+-----+--------------+--------------------------------------------+------------------------------+
| 1   | 1   | rod          | round-to-odd (OR bits into LSB, aka "jam") | !v[d] & v[d-1:0]≠0           |
+-----+-----+--------------+--------------------------------------------+------------------------------+


以下の命令の説明では、丸め関数を表すために以下の操作が使用されています。
::

  roundoff*unsigned(v, d) = (unsigned(v) >> d) + r
  roundoff*signed(v, d) = (signed(v) >> d) + r
  

`vxrm[XLEN-1:2]` はゼロが書き込まれるべきです。

..
  NOTE: A new rounding mode can be set while saving the original
  rounding mode using a single `csrwi` instruction.


.. note::

  新しい丸め込みモードを設定することができます。
  

***************************************************
ベクトル固定小数点飽和フラグ `vxsat`
***************************************************

..
  The `vxsat` CSR holds a single read-write bit that indicates if a
  fixed-point instruction has had to saturate an output value to fit
  into a destination format.


`vxsat` CSRは読み書き可能な1ビットを保持しており、
固定小数点命令が出力値を出力先のフォーマットに収めるために飽和させる必要があったかどうかを示します。


..
  The `vxsat` bit is mirrored in `vcsr`.


`vxsat` ビットは `vcsr` のミラーです。


***************************************************
Vector 制御・ステータスレジスタ `vcsr`
***************************************************

..
  The `vxrm` and `vxsat` separate CSRs can also be accessed via fields
  in the vector control and status CSR, `vcsr`.


`vxrm` と `vxsat` CSRは、ベクトル制御・ステータスCSRである `vcsr` のフィールドを介してアクセスすることもできます。

..


..
+------+-----------+-------------------------------------+
| Bits | Name      | Description                         |
+======+===========+=====================================+
| 2:1  | vxrm[1:0] | Fixed-point rounding mode           |
+------+-----------+-------------------------------------+
| 0    | vxsat     | Fixed-point accrued saturation flag |
+------+-----------+-------------------------------------+


**********************************************
リセット時のベクトル拡張の状態
**********************************************

..
  The vector extension must have a consistent state at reset.  In
  particular, `vtype` and `vl` must have values that can be read and
  then restored with a single `vsetvl` instruction.


ベクトル拡張は、リセット時に一貫した状態を持っている必要があります。
特に、`vtype` と `vl` は、1つの `vsetvl` 命令で読み取ってから復元できる値を持っていなければなりません。

..
  NOTE: It is recommended that at reset, `vtype.vill` is set, the
  remaining bits in `vtype` are zero, and `vl` is set to zero.


.. note::

  
..
  The `vstart`, `vxrm`, `vxsat` CSRs can have arbitrary values at reset.


`vstart`, `vxrm`, `vxsat` CSR は、リセット時に任意の値を持つことができます。

..
  NOTE: Any use of the vector unit will require an initial `vset{i}vl{i}`,
  which will reset `vstart`.  The `vxrm` and `vxsat` fields should be
  reset explicitly in software before use.


.. note::

  これにより `vstart` がリセットされます。
  `vxrm` と `vxsat` フィールドは、使用前にソフトウェアで明示的にリセットする必要があります。
  
..
  The vector registers can have arbitrary values at reset.


ベクトルレジスタはリセット時に任意の値を持つことができます。


#########################################################################
ベクトルレジスタ状態へのベクトル要素のマッピング
#########################################################################

..
  The following diagrams illustrate how different width elements are
  packed into the bytes of a vector register depending on the current
  SEW and LMUL settings, as well as implementation VLEN.  Elements are
  packed into each vector register with the least-significant byte in
  the lowest-numbered bits.


以下の図は、現在のSEWおよびLMULの設定と、実装のVLENに応じて、
異なる幅の要素がどのようにベクトルレジスタのバイトに詰め込まれるかを示しています。
要素は、最下位バイトが最下位ビットになるように各ベクトルレジスタに詰め込まれます。


*******************************
LMUL = 1 時のマッピング
*******************************

..
  When LMUL=1, elements are simply packed in order from the
  least-significant to most-significant bits of the vector register.


LMUL=1の場合、要素はベクトルレジスタの最下位ビットから最上位ビットの順に単純にパックされます。

..
  NOTE: To increase readability, vector register layouts are drawn with
  bytes ordered from right to left with increasing byte address.  Bits
  within an element are numbered in a little-endian format with
  increasing bit index from right to left corresponding to increasing
  magnitude.


.. note::

  バイトアドレスが右から左に向かって増加するように並べられています。
  要素内のビットはリトルエンディアン形式で番号付けされており、
  右から左へのビットインデックスの増加は大きさの増加に対応します。
  
::

  // LMUL=1 examples.
  LMUL=1の例
  
  The element index is given in hexadecimal and is shown placed at the
  least-significant byte of the stored element.
  
  要素のインデックスは16進数で与えられ、
  格納された要素の最下位バイトに配置されて表示されます。
  
   VLEN=32b
  
   Byte         3 2 1 0
  
   SEW=8b       3 2 1 0
   SEW=16b        1   0
   SEW=32b            0
  
   VLEN=64b
  
   Byte        7 6 5 4 3 2 1 0
  
   SEW=8b      7 6 5 4 3 2 1 0
   SEW=16b       3   2   1   0
   SEW=32b           1       0
   SEW=64b                   0
  
   VLEN=128b
  
   Byte        F E D C B A 9 8 7 6 5 4 3 2 1 0
  
   SEW=8b      F E D C B A 9 8 7 6 5 4 3 2 1 0
   SEW=16b       7   6   5   4   3   2   1   0
   SEW=32b           3       2       1       0
   SEW=64b                   1               0
   SEW=128b                                  0
  
   VLEN=256b
  
   Byte     1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0
  
   SEW=8b   1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0
   SEW=16b     F   E   D   C   B   A   9   8   7   6   5   4   3   2   1   0
   SEW=32b         7       6       5       4       3       2       1       0
   SEW=64b                 3               2               1               0
   SEW=128b                                1                               0
  

************************************
LMUL < 1の場合のマッピング
************************************

..
  When LMUL < 1, only the first LMUL*VLEN/SEW elements in the vector
  register are used.  The remaining space in the vector register is
  treated as part of the tail, and hence must obey the vta setting.


LMUL < 1 の場合、ベクトルレジスタの最初の LMUL*VLEN/SEW 要素のみが使用される。
ベクトルレジスタ内の残りのスペースはテールの一部として扱われるため、
vtaの設定に従わなければなりません。

::

   // Example, VLEN=128b, LMUL=1/4
   VLEN=128b, LMUL=1/4の例
  
   Byte        F E D C B A 9 8 7 6 5 4 3 2 1 0
  
   SEW=8b      - - - - - - - - - - - - 3 2 1 0
   SEW=16b       -   -   -   -   -   -   1   0
   SEW=32b           -       -       -       0
  

****************************
LMUL > 1 のマッピング
****************************

..
  When vector registers are grouped, the elements of the vector register
  group are striped across the constituent vector registers.  The
  elements are packed contiguously in element order in each vector
  register in the group, moving to the next highest-numbered vector
  register in the group once each vector register is filled.


ベクトルレジスタがグループ化されると、ベクトルレジスタグループの要素は、構成するベクトルレジスタ間でストライプされます。
要素は、グループ内の各ベクトルレジスタに要素順に連続して詰められ、
各ベクトルレジスタが満たされると、グループ内の次の最高番号(訳注:ここが上手く訳せない)のベクトルレジスタに移動します。

::

  // LMUL > 1 examples
  
  LMUL > 1の例
  
   VLEN=32b, SEW=8b, LMUL=2
  
   Byte         3 2 1 0
   v2*n         3 2 1 0
   v2*n+1       7 6 5 4
  
   VLEN=32b, SEW=16b, LMUL=2
  
   Byte         3 2 1 0
   v2*n           1   0
   v2*n+1         3   2
  
   VLEN=32b, SEW=16b, LMUL=4
  
   Byte         3 2 1 0
   v4*n           1   0
   v4*n+1         3   2
   v4*n+2         5   4
   v4*n+3         7   6
  
   VLEN=32b, SEW=32b, LMUL=4
  
   Byte         3 2 1 0
   v4*n               0
   v4*n+1             1
   v4*n+2             2
   v4*n+3             3
  
   VLEN=64b, SEW=32b, LMUL=2
  
   Byte         7 6 5 4 3 2 1 0
   v2*n               1       0
   v2*n+1             3       2
  
   VLEN=64b, SEW=32b, LMUL=4
  
   Byte         7 6 5 4 3 2 1 0
   v4*n               1       0
   v4*n+1             3       2
   v4*n+2             5       4
   v4*n+3             7       6
  
   VLEN=128b, SEW=32b, LMUL=2
  
   Byte        F E D C B A 9 8 7 6 5 4 3 2 1 0
   v2*n              3       2       1       0
   v2*n+1            7       6       5       4
  
   VLEN=128b, SEW=32b, LMUL=4
  
   Byte          F E D C B A 9 8 7 6 5 4 3 2 1 0
   v4*n                3       2       1       0
   v4*n+1              7       6       5       4
   v4*n+2              B       A       9       8
   v4*n+3              F       E       D       C
  

*************************************************
混合幅演算でのマッピングについて
*************************************************

..
  The vector ISA is designed to support mixed-width operations without
  requiring explicit additional rearrangement instructions.  The
  recommended software strategy when operating on vectors of different
  precision values is to modify `vtype` dynamically to keep SEW/LMUL
  constant (and hence VLMAX constant).


ベクトルISAは、明示的な追加再配置命令を必要とせずに、混合幅の演算をサポートするように設計されています。
異なる精度のベクトルを操作する場合、推奨されるソフトウェア戦略は、`vtype` を動的に変更して、
SEW/LMULを一定に保つことです(したがって、VLMAXも一定になります)。

..
  The following example shows four different packed element widths (8b,
  16b, 32b, 64b) in a VLEN=128b implementation.  The vector register
  grouping factor (LMUL) is increased by the relative element size such
  that each group can hold the same number of vector elements (VLMAX=8
  in this example) to simplify stripmining code.


次の例では、VLEN=128bの実装において、4つの異なるパックエレメント幅(8b、16b、32b、64b)を示しています。
ベクトルレジスタのグループ化係数(LMUL)は、各グループが同じ数のベクトル要素(この例ではVLMAX=8)を保持できるように、
相対的な要素サイズによって増加し、ストリップマイニングコードを簡素化します。

::

  Example VLEN=128b, with SEW/LMUL=16
  
  Byte      F E D C B A 9 8 7 6 5 4 3 2 1 0
  vn        - - - - - - - - 7 6 5 4 3 2 1 0  SEW=8b, LMUL=1/2
  
  vn          7   6   5   4   3   2   1   0  SEW=16b, LMUL=1
  
  v2*n            3       2       1       0  SEW=32b, LMUL=2
  v2*n+1          7       6       5       4
  
  v4*n                    1               0  SEW=64b, LMUL=4
  v4*n+1                  3               2
  v4*n+2                  5               4
  v4*n+3                  7               6
  

..
  The following table shows each possible constant SEW/LMUL operating
  point for loops with mixed-width operations.  Each column represents a
  constant SEW/LMUL operating point.  Entries in table are the LMUL
  values that yield that column's SEW/LMUL value for the datawidth on
  that row.  In each column, an LMUL setting for a datawidth indicates
  that it can be aligned with the other datawidths in the same column
  that also have an LMUL setting, such that all have the same VLMAX.


次の表は、幅が混在しているループで考えられる一定のSEW/LMULの動作点を示しています。
各列は一定のSEW/LMUL動作点を表しています。
表のエントリは、その行のデータ幅に対して、その列のSEW/LMUL値をもたらすLMUL値です。
各列のデータ幅のLMUL設定は、同じ列のLMUL設定を持つ他のデータ幅と、すべてが同じVLMAXになるように整列できることを示しています。

..


..
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW/LMUL | 1 | 2 | 4 | 8 | 16  | 32  | 64  | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 |
+==========+===+===+===+===+=====+=====+=====+=====+=====+=====+======+======+======+======+
| SEW= 8   | 8 | 4 | 2 | 1 | 1/2 | 1/4 | 1/8 |     |     |     |      |      |      |      |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW= 16  |   | 8 | 4 | 2 | 1   | 1/2 | 1/4 | 1/8 |     |     |      |      |      |      |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW= 32  |   |   | 8 | 4 | 2   | 1   | 1/2 | 1/4 | 1/8 |     |      |      |      |      |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW= 64  |   |   |   | 8 | 4   | 2   | 1   | 1/2 | 1/4 | 1/8 |      |      |      |      |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW= 128 |   |   |   |   | 8   | 4   | 2   | 1   | 1/2 | 1/4 | 1/8  |      |      |      |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW= 256 |   |   |   |   |     | 8   | 4   | 2   | 1   | 1/2 | 1/4  | 1/8  |      |      |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW= 512 |   |   |   |   |     |     | 8   | 4   | 2   | 1   | 1/2  | 1/4  | 1/8  |      |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+
| SEW=1024 |   |   |   |   |     |     |     | 8   | 4   | 2   | 1    | 1/2  | 1/4  | 1/8  |
+----------+---+---+---+---+-----+-----+-----+-----+-----+-----+------+------+------+------+


..
  Larger LMUL settings can also used to simply increase vector length to
  reduce instruction fetch and dispatch overheads in cases where fewer
  vector register groups are needed.


LMULの設定を大きくすると、単純にベクトルの長さを長くして命令のフェッチとディスパッチのオーバーヘッドを減らすことができ、
ベクトルレジスタグループの数が少なくて済む場合にも使用できます。

..
  NOTE: The SEW/LMUL values of 2048 and greater are shown in the table
  for completeness but they do not add a useful operating point as they
  use less than the full register capacity and do not enable more
  architectural registers.


.. note::

  より多くのアーキテクチャレジスタを有効にしていないため、有用な動作ポイントを追加していません。
  

***********************************************
LMUL > 1 かつ ELEN > VLEN のマッピング
***********************************************

..
  If vector registers are grouped to support larger SEW, with ELEN >
  VLEN, the vector registers in the group are concatenated to form a
  single array of bytes, with the lowest-numbered register in the group
  holding the lowest-addressed bytes from the memory layout.


より大きなSEWをサポートするためにベクトルレジスタがグループ化され、ELEN > VLENとなった場合、
グループ内のベクトルレジスタは連結されて1つのバイト配列となり、
グループ内で最も番号の低いレジスタが、メモリレイアウトから最も低いアドレスのバイトを保持します。

::

   LMUL > 1 ELEN>VLEN, examples
  
   VLEN=32b, SEW=64b, LMUL=2
  
   Byte         3 2 1 0
   v2*n               0
   v2*n+1
  
   VLEN=32b, SEW=64b, LMUL=4
  
   Byte         3 2 1 0
   v4*n               0
   v4*n+1
   v4*n+2             1
   v4*n+3
  
   VLEN=32b, SEW=64b, LMUL=8
  
   Byte         3 2 1 0
   v8*n               0
   v8*n+1
   v8*n+2             1
   v8*n+3
   v8*n+4             2
   v8*n+5
   v8*n+6             3
   v8*n+7
  

.. _sec-mask-register-layout:


****************************************
マスクレジスタのレイアウト
****************************************

..
  A vector mask occupies only one vector register regardless of SEW and
  LMUL.  Each element is allocated a single mask bit in a mask vector
  register.


ベクトルマスクは、SEWやLMULに関係なく、1つのベクトルレジスタのみを占有します。
各要素には、マスクベクトルレジスタに1つのマスクビットが割り当てられます。

..
  NOTE: Earlier designs (pre-0.9) had a varying number of bits per mask
  value (MLEN).  In the 0.9 design, MLEN=1.


.. note::

  0.9デザインでは、MLEN=1です。
  

=========================
マスク要素の場所
=========================

..
  The mask bit for element *i* is located in bit *i* of the mask
  register, independent of SEW or LMUL.


要素 *i* のマスクビットは、SEWやLMULとは関係なく、マスクレジスタのビット *i* に配置されます。


::

   VLEN=32b
  
            Byte    3   2   1   0
   LMUL=1,SEW=8b
                    3   2   1   0  Element
                  [03][02][01][00] Mask bit position in decimal
  
   LMUL=2,SEW=16b
                        1       0
                      [01]    [00]
                        3       2
                      [03]    [02]
  
   LMUL=4,SEW=32b               0
                              [00]
                                1
                              [01]
                                2
                              [02]
                                3
                              [03]
  

::

   LMUL=2,SEW=8b
                    3   2   1   0
                  [03][02][01][00]
                    7   6   5   4
                  [07][06][05][04]
  
   LMUL=8,SEW=32b
                                0
                              [00]
                                1
                              [01]
                                2
                              [02]
                                3
                              [03]
                                4
                              [04]
                                5
                              [05]
                                6
                              [06]
                                7
                              [07]
  
   LMUL=8,SEW=8b
                    3   2   1   0
                  [03][02][01][00]
                    7   6   5   4
                  [07][06][05][04]
                    B   A   9   8
                  [11][10][09][08]
                    F   E   D   C
                  [15][14][13][12]
                   13  12  11  10
                  [19][18][17][16]
                   17  16  15  14
                  [23][22][21][20]
                   1B  1A  19  18
                  [27][26][25][24]
                   1F  1E  1D  1C
                  [31][30][29][28]
  

#####################################
ベクトル命令フォーマット
#####################################

..
  The instructions in the vector extension fit under two existing major
  opcodes (LOAD-FP and STORE-FP) and one new major opcode (OP-V).


ベクトル拡張の命令は、既存の2つのメジャーオペコード(LOAD-FP、STORE-FP)と1つの新しいメジャーオペコード(OP-V)に当てはまります。

..
  Vector loads and stores are encoded within the scalar floating-point
  load and store major opcodes (LOAD-FP/STORE-FP).  The vector load and
  store encodings repurpose a portion of the standard scalar
  floating-point load/store 12-bit immediate field to provide further
  vector instruction encoding, with bit 25 holding the standard vector
  mask bit (see :ref:`sec-vector-mask-encoding` ).


ベクトルのロードとストアは、スカラ浮動小数点のロードとストアのメジャーオペコード(LOAD-FP/STORE-FP)の中にエンコードされます。
ベクトルロード/ストアのエンコーディングは、標準的なスカラ浮動小数点ロード/ストアの12ビット即値フィールドの一部を再利用して、
さらにベクトル命令のエンコーディングを行い、ビット25には標準的なベクトルマスクビット ( :ref:`sec-vector-mask-encoding`  を参照)を格納します。

include::vmem-format.adoc[]

include::valu-format.adoc[]

include::vcfg-format.adoc[]

..
  Vector instructions can have scalar or vector source operands and
  produce scalar or vector results, and most vector instructions can be
  performed either unconditionally or conditionally under a mask.


ベクトル命令は、スカラまたはベクトルのソースオペランドを持ち、
スカラまたはベクトルの結果を生成することができ、
ほとんどのベクトル命令は、無条件またはマスクの下で条件付きで実行することができます。

..
  Vector loads and stores move bit patterns between vector register
  elements and memory.  Vector arithmetic instructions operate on values
  held in vector register elements.


ベクトルのロードとストアは、ベクトルのレジスタ要素とメモリの間でビットパタンを移動させます。
ベクトル演算命令は、ベクトルレジスタ要素に保持された値を演算します。


*************************
スカラオペランド
*************************

..
  Scalar operands can be immediates, or taken from the `x` registers,
  the `f` registers, or element 0 of a vector register.  Scalar results
  are written to an `x` or `f` register or to element 0 of a vector
  register.  Any vector register can be used to hold a scalar regardless
  of the current LMUL setting.


スカラオペランドは即値、または `x` レジスタ、`f`  レジスタ、またはベクトルレジスタの要素 0 から取得することができます。
スカラの結果は `x` または `f` レジスタ、またはベクトルレジスタの要素 0 に書き込まれます。
現在の LMUL の設定に関わらず、どのベクトルレジスタもスカラを保持するために使用できます。

..
  NOTE: In a change from v0.6, the floating-point registers no longer
  overlay the vector registers and scalars can now come from the integer
  or floating-point registers.  Not overlaying the `f` registers reduces
  vector register pressure, avoids interactions with the standard
  calling convention, simplifies high-performance scalar floating-point
  design, and provides compatibility with the Zfinx ISA option.
  Overlaying `f` with `v` would provide the advantage of lowering the
  number of state bits in some implementations, but complicates
  high-performance designs and would prevent compatibility with the
  Zfinx ISA option.


.. note::

  `f` レジスタをオーバーレイしないことで、ベクトルレジスタが圧迫させないようにし、
  標準的な呼び出し規則との相互作用を避け、高性能なスカラ浮動小数点の設計を簡素化し、Zfinx ISA オプションとの互換性を実現します。
  `f` と `v` を重ねると、実装によってはステート・ビット数を減らせるというメリットがありますが、
  高性能な設計が複雑になり、Zfinx ISAオプションとの互換性もなくなります。
  

****************************
ベクトルオペランド
****************************

..
  Each vector operand has an *effective* *element* *width* (EEW) and an
  *effective* LMUL (EMUL) that is used to determine the size and
  location of all the elements within a vector register group.  By
  default, for most operands of most instructions, EEW=SEW and
  EMUL=LMUL.


各ベクトルオペランドは、ベクトルレジスタグループ内のすべての要素のサイズと位置を決定するために使用される *effective* *element* *width* (EEW) と *effective* *LMUL* (EMUL)を持っています。
デフォルトでは、ほとんどの命令のほとんどのオペランドで、EEW=SEW、EMUL=LMULとなります。

..
  Some vector instructions have source and destination vector operands
  with the same number of elements but different widths, so that EEW and
  EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL.
  For example, most widening arithmetic instructions have a source group
  with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and
  EMUL=2*LMUL.  Narrowing instructions have a source operand that has
  EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.


ベクトル命令の中には、ソースと書き込みベクトルオペランドの要素数が同じでも幅が異なるものがあり、
その場合、EEWとEMULはそれぞれSEWとLMULと異なりますが、EEW/EMUL = SEW/LMULとなります。
例えば、Widening演算命令の多くは、ソースグループがEEW=SEW、EMUL=LMULで、
書き込みグループがEEW=2*SEW、EMUL=2*LMULとなっています。
Narrowing命令は、ソースオペランドはEEW=2*SEW、EMUL=2*LMULですが、
書き込みレジスタはEEW=SEW、EMUL=LMULです。

..
  Vector operands or results may occupy one or more vector registers
  depending on EMUL, but are always specified using the lowest-numbered
  vector register in the group.  Using other than the lowest-numbered
  vector register to specify a vector register group is a reserved
  encoding.


ベクトルオペランドまたは演算結果は、EMULに応じて1つまたは複数のベクトルレジスタを占めることがありますが、
常にグループ内で最も低い番号のベクトルレジスタを使用して指定されます。
ベクトルレジスタグループの指定に最下位のベクトルレジスタ以外を使用するエンコーディングは予約されています。

..
  A destination vector register group can overlap a source vector register
  group only if one of the following holds:
  
  - The destination EEW equals the source EEW.
  - The destination EEW is smaller than the source EEW and the overlap is in
    the lowest-numbered part of the source register group (e.g., when LMUL=1,
    `vnsrl.wi v0, v0, 3` is legal, but a destination of `v1` is not).
  - The destination EEW is greater than the source EEW, the source EMUL is
    at least 1, and the overlap is in the highest-numbered part of the
    destination register group (e.g., when LMUL=8, `vzext.vf4 v0, v6` is legal,
    but a source of `v0`, `v2`, or `v4` is not).
  
  For the purpose of register group overlap constraints, mask elements have
  EEW=1.


書き込みベクトルレジスタグループは、次のいずれかが成立する場合に限り、
ソースベクトルレジスタグループとオーバーラップすることができます。

- 書き込みレジスタのEEWとソースのEEWが等しい。
- 書き込みレジスタのEEWがソースのEEWよりも小さく、オーバーラップする部分がソースのレジスタグループの最も低い番号の部分である(例えば、LMUL=1の場合、 `vnsrl.wi v0, v0, 3` は合法だが、`v1` の書き込みは合法ではない)。
- 書き込みレジスタのEEWがソースのEEWよりも大きく、ソースのEMULが少なくとも1であり、オーバーラップが書き込みレジスタグループの最も高い番号の部分にある場合(例えば、LMUL=8の場合、 `vzext.vf4 v0, v6` は合法ですが、ソースの `v0` 、 `v2` 、`v4` は合法ではない)。

レジスタグループのオーバーラップ制約のために、マスク要素はEEW=1となります。

..
  The largest vector register group used by an instruction can not be
  greater than 8 vector registers (i.e., EMUL{le}8), and if a vector
  instruction would require greater than 8 vector registers in a group,
  the instruction encoding is reserved.  For example, a widening
  operation that produces a widened vector register group result when
  LMUL=8 is reserved as this would imply a result EMUL=16.


ある命令で使用される最大のベクトルレジスタグループは、8個以上のベクトル・レジスタであってはならず(すなわち、EMUL{le}8)、
ベクトル命令が8個以上のベクトルレジスタグループを必要とする場合には、その命令エンコーディングは予約されます。
例えば、LMUL=8のときにWideringされたベクトルレジスタ群の結果を得るWidening演算は、EMUL=16の結果を意味するため、予約されます。

..
  Widened scalar values, e.g., results from widening reduction
  operations, are held in the first element of a vector register and
  have EMUL=1.


Wideningされたスカラ値(Wideningリダクション演算の結果など)は、
ベクトルレジスタの最初の要素に保持され、EMUL=1となる。

..
  NOTE: Current reduction operations are defined to hold input and
  output values in a single vector register, with implicit EMUL of 1, so
  cannot accommodate using a vector register group to hold a wide scalar
  reduction result.  This would require an independent parameter to give
  the EMUL for the scalar reduction element.


.. note::

  暗黙のうちにEMULが1となるため、wide スカラリダクションの結果を保持するためにベクトルレジスタグループを使用することはできません。
  この場合、スカラリダクション要素のEMULを指定する独立したパラメータが必要になります。
  

****************************
ベクトルマスキング
****************************

..
  Masking is supported on many vector instructions.  Element operations
  that are masked off (inactive) never generate exceptions.  The
  destination vector register elements corresponding to masked-off
  elements are handled with either a mask-undisturbed or mask-agnostic
  policy depending on the setting of the `vma` bit in `vtype` (Section
  :ref:`sec-agnostic` ).


マスキングは多くのベクトル命令でサポートされています。
マスクオフされた(非アクティブな)要素の操作は、例外を発生させません。
マスクオフされた要素に対応する書き込みベクトルレジスタの要素は、`vtype` の `vma` ビットの
設定に応じて、mask-undisturbedまたはmask-agnosticのいずれかのポリシで処理されます(セクション :ref:`sec-agnostic`  )。

..
  The mask value used to control execution of a masked vector
  instruction is always supplied by vector register `v0`.


マスクされたベクトル命令の実行を制御するために使用されるマスク値は、
常にベクトルレジスタ `v0` によって提供されます。

..
  NOTE: Future vector extensions may provide longer instruction
  encodings with space for a full mask register specifier.


.. note::

  より長い命令エンコーディングを提供するかもしれません。
  
..
  The destination vector register group for a masked vector instruction
  cannot overlap the source mask register (`v0`), unless the destination
  vector register is being written with a mask value (e.g., comparisons)
  or the scalar result of a reduction.  These instruction encodings are
  reserved.


マスクド・ベクトル命令の書き込みベクトルレジスタグループは、書き込みベクトルレジスタにマスク値(比較など)やリダクションのスカラ結果が書き込まれていない限り、
ソースマスクレジスタ(`v0`)とオーバーラップすることはできません。
これらの命令エンコーディングは予約済みです。

..
  NOTE: This constraint supports restart with a non-zero `vstart` value.


.. note::

  
..
  NOTE: Some masked instructions that target `v0` which were legal in
  v0.8 are illegal with the new MLEN=1 mask layout for v1.0. For
  example, `vadd.vv v0, v1, v2, v0.m` is now always illegal; previously,
  it was legal for LMUL=1.


.. note::

  例えば、 `vadd.vv v0, v1, v2, v0.m` は、以前は LMUL=1 の場合には合法でしたが、現在は常に違法です。
  
..
  Other vector registers can be used to hold working mask values, and
  mask vector logical operations are provided to perform predicate
  calculations. [[sec-mask-vector-logical]]


作業用マスクの値を保持するために他のベクトルレジスタを使用することができ、
述語計算を行うためのマスクベクトル論理演算が提供されています。
.. _sec-mask-vector-logical:を参照してください。

..
  As specified in Section :ref:`sec-agnostic` , mask destination values are
  always treated as tail-agnostic, regardless of the setting of `vta`.


:ref:`sec-agnostic`   で規定されているように、マスクの書き込み値は `vta` の設定にかかわらず、
常に tail-agnostic として扱われます。

.. _sec-vector-mask-encoding:


==================================
マスクエンコーディング
==================================

..
  Where available, masking is encoded in a single-bit `vm` field in the
   instruction (`inst[25]`).


マスキングが可能な場合は、命令内の1ビットの `vm` フィールドにエンコードされます( `inst[25]` )。

..
  [cols="1,15"]
  |===
  | vm | Description
  
  | 0 | vector result, only where v0.mask[i] = 1
  | 1 | unmasked
  |===


..


..
+----+-----------------------------------------------------------+
| vm | 説明                                                      |
+====+===========================================================+
| 0  | ベクトル演算の結果は v0.mask[i]=1の領域にのみ書き込まれる |
+----+-----------------------------------------------------------+
| 1  | マスクされない                                            |
+----+-----------------------------------------------------------+


..
  NOTE: In earlier proposals, `vm` was a two-bit field `vm[1:0]` that
  provided both true and complement masking using `v0` as well as
  encoding scalar operations.


.. note::

  `v0` を使った真と補填マスキングと、スカラ演算の符号化を行っていました。
  
..
  Vector masking is represented in assembler code as another vector
  operand, with `.t` indicating if operation occurs when `v0.mask[i]` is
  `1`.  If no masking operand is specified, unmasked vector execution
  (`vm=1`) is assumed.


ベクトルのマスキングはアセンブラのコードでは別のベクトルオペランドとして表現され、`.t` は `v0.mask[i]` が `1` のときに演算が行われるかどうかを示します。
マスキングオペランドが指定されていない場合は、ベクトル実行はマスクされていない(`vm=1`)とされます。

..
  ----
      vop.v*    v1, v2, v3, v0.t  # enabled where v0.mask[i]=1, m=0
      vop.v*    v1, v2, v3        # unmasked vector operation, m=1
  ----


::

      vop.v*    v1, v2, v3, v0.t  # マスク有効、v0.mask[i]=1の部分がマスクされる
      vop.v*    v1, v2, v3        # マスク無効
  

..
  NOTE: Even though current vector extensions only support one vector
  mask register `v0` and only the true form of predication, the assembly
  syntax writes it out in full to be compatible with future extensions
  that might add a mask register specifier and supporting both true and
  complement masking. The `.t` suffix on the masking operand also helps
  to visually encode the use of a mask.


.. note::

  アセンブリ構文では、マスクレジスタ指定子を追加し、真のマスキングと負のマスキングの両方をサポートする可能性のある将来の拡張と互換性を持たせるために、完全な文法を書き出します。
  マスキングオペランドの接尾辞 `.t` は、マスクの使用を視覚的に符号化するのにも役立ちます。
  
..
  NOTE: The `.mask` suffix is not part of the assembly syntax.
  We only append it in contexts where a mask vector is subscripted,
  e.g., `v0.mask[i]`.


.. note::

  `v0.mask[i]` のように文章中にマスクベクトルに添え字として使用されるだけです。
  
.. _sec-inactive-defs:


**********************************************************************************************
プリスタート、アクティブ、非アクティブ、ボディ、末尾要素の定義
**********************************************************************************************

..
  The destination element indices operated on during a vector
  instruction's execution can be divided into three disjoint subsets.


ベクトル命令の実行時に操作される出力要素のインデックスは、3つのサブセットに分けられます。

..
  * The *prestart* elements are those whose element index is less than the
  initial value in the `vstart` register.  The prestart elements do not
  raise exceptions and do not update the destination vector register.


* *プリスタート* 要素は、要素のインデックスが `vstart` レジスタの初期値よりも小さいものです。
プリスタート要素は、例外を発生させず、書き込みベクトルレジスタも更新しません。

..
  * The *body* elements are those whose element index is greater than or equal
  to the initial value in the `vstart` register, and less than the current
  vector length setting in `vl`. The body can be split into two disjoint subsets:


* *ボディ* 要素は、要素のインデックスが `vstart` レジスタの初期値以上で、
`vl` の現在のベクトル長設定よりも小さいものです。ボディは2つの分離したサブセットに分割することができます。

..
  ** The *active* elements during a vector instruction's execution are the
  elements within the body and where the current mask is enabled at that element
  position.  The active elements can raise exceptions and update the destination
  vector register group.


** ベクトル命令の実行中の *アクティブ* 要素は、ボディ内の要素で、
その要素位置で現在のマスクが有効になっているところです。
アクティブな要素は、例外を発生させたり、書き込みベクトルレジスタグループを更新することができます。

..
  ** The *inactive* elements are the elements within the body
  but where the current mask is disabled at that element
  position.  The inactive elements do not raise exceptions and do not
  update any destination vector register group unless masked agnostic is
  specified (`vtype.vma`=1), in which case inactive elements may be
  overwritten with 1s.


** *非アクティブ* 要素は、ボディ内の要素で、その要素の位置で現在のマスクが無効になっている要素です。
非アクティブな要素は、 masked agnostic が指定されていない限り (`vtype.vma`=1)、
例外を発生させず、書き込みベクトルレジスタグループを更新しません。
この場合、非アクティブな要素は 1 で上書きされる可能性があります。

..
  * The *tail* elements during a vector instruction's execution are the
  elements past the current vector length setting specified in `vl`.
  The tail elements do not raise exceptions, and do not update any
  destination vector register group unless tail agnostic is specified
  (`vtype.vta`=1), in which case tail elements may be overwritten with
  1s, or with the result of the instruction in the case of
  mask-producing instructions except for mask loads.  When LMUL < 1, the
  tail includes the elements past VLMAX that are held in the same vector
  register.


* ベクトル命令実行中の *末尾* 要素は、`vl` で指定された現在のベクトル長設定を超えた要素です。
末尾要素は例外を発生させず、Tail Agunostics指定されていない限り (`vtype.vta`=1)、
書き込みベクトルレジスタグループを更新しません。
この場合、末尾要素は 1 で上書きされるか、マスクロードを除くマスク生成命令の場合は命令の結果で上書きされる可能性があります。
LMUL < 1の場合、末尾には同じベクトルレジスタに保持されているVLMAX以降の要素が含まれます。

::

      for element index x
      prestart(x) = (0 <= x < vstart)
      body(x)     = (vstart <= x < vl)
      tail(x)     = (vl <= x < max(VLMAX,VLEN/SEW))
      mask(x)     = unmasked || v0.mask[x] == 1
      active(x)   = body(x) && mask(x)
      inactive(x) = body(x) && !mask(x)
  

..
  When `vstart` {ge} `vl`, there are no body elements, and no elements
  are updated in any destination vector register group, including that
  no tail elements are updated with agnostic values.


`vstart` {ge} `vl` のとき、ボディの要素はなく、どの書き込みベクトルレジスタグループの要素も更新されず、
これは末尾要素がagnosticな値で更新されないことも含みます。

..
  NOTE: As a consequence, when `vl`=0, no elements, including agnostic
  elements, are updated in the destination vector register group
  regardless of `vstart`.


.. note::

  書き込みベクトルレジスタグループはAgnosticな要素を含めて、いかなる要素も更新されません。
  
..
  Instructions that write an `x` register or `f` register
  do so even when `vstart` {ge} `vl`, including when `vl`=0.


`x` レジスタや `f` レジスタを書き込む命令は、
`vl` =0のときも含めて、`vstart` {ge} `vl` のときにも実行されます。

..
  NOTE: Some instructions such as `vslidedown` and `vrgather` may read
  indices past `vl` or even VLMAX in source vector register groups.  The
  general policy is to return the value 0 when the index is greater than
  VLMAX in the source vector register group.


.. note::

  `vl` または VLMAX を超えるインデックスを読み取ることがあります。
  一般的なポリシは、インデックスがソースベクトルレジスタグループのVLMAXよりも大きい場合、値0を返します。
  
.. _sec-vector-config:

##########################################################################
コンフィグレーション設定命令 (`vsetvli`/`vsetivl`/`vsetvl`)
##########################################################################

..
  One of the common approaches to handling a large number of elements is
  "stripmining" where each iteration of a loop handles some number of elements,
  and the iterations continue until all elements have been processed. The RISC-V
  vector specification provides direct, portable support for this approach.
  The application specifies the total number of elements to be processed (the application vector length or AVL) as a
  candidate value for `vl`, and the hardware responds via a general-purpose
  register with the (frequently smaller) number of elements that the hardware
  will handle per iteration (stored in `vl`), based on the microarchitectural
  implementation and the `vtype` setting. A straightforward loop structure,
  shown in :ref:`example-stripmine-sew` , depicts the ease with which the code keeps
  track of the remaining number of elements and the amount per iteration handled
  by hardware.


多数の要素を処理するための一般的なアプローチの一つに、ループの各イタレーションでいくつかの要素を処理し、
すべての要素が処理されるまでイタレーションを続ける"ストリップマイニング"があります。
RISC-Vのベクトル仕様では、この手法を直接かつポータブルにサポートしています。
アプリケーションは、処理する要素の総数(アプリケーションベクトル長、AVL)を `vl` の候補値として指定し、
ハードウェアは、マイクロアーキテクチャの実装と `vtype` の設定に基づいて、
ハードウェアが反復ごとに処理する要素数( `vl` に格納されている) を、汎用レジスタを介して応答します。
:ref:`example-stripmin-sew`  に示されている簡単なループ構造は、コードが残りの要素数とハードウェアが処理する1回あたりの量を簡単に追跡していることを示しています。

..
  A set of instructions is provided to allow rapid configuration of the
  values in `vl` and `vtype` to match application needs.  The
  `vset{i}vl{i}` instructions set the `vtype` and `vl` CSRs based on
  their arguments, and write the new value of `vl` into `rd`.


アプリケーションのニーズに合わせて `vl` と `vtype` の値を迅速に設定できるように、一連の命令が提供されています。
`vset{i}vl{i}` 命令は、その引数に基づいて、`vtype` と `vl` の CSR を設定し、`rd` に `vl` の新しい値を書き込みます。

::

   vsetvli rd, rs1, vtypei   # rd = new vl, rs1 = AVL, vtypei = new vtype setting
   vsetivli rd, uimm, vtypei # rd = new vl, uimm = AVL, vtypei = new vtype setting
   vsetvl  rd, rs1, rs2      # rd = new vl, rs1 = AVL, rs2 = new vtype value
  

include::vcfg-format.adoc[]


*********************************
`vtype` エンコーディング
*********************************

include::vtype-format.adoc[]

..
  The new `vtype` setting is encoded in the immediate fields of
  `vsetvli` and `vsetivli`,  and in the `rs2` register for `vsetvl`.


新しい `vtype` の設定は、 `vsetvli` と `vsetivli` の即時フィールド、および `vsetvl` の `rs2` レジスタにエンコードされます。

..
  ----
   Suggested assembler names used for vset{i}vli vtypei immediate
  
   e8    # SEW=8b
   e16   # SEW=16b
   e32   # SEW=32b
   e64   # SEW=64b
   e128  # SEW=128b
   e256  # SEW=256b
   e512  # SEW=512b
   e1024 # SEW=1024b
  
   mf8  # LMUL=1/8
   mf4  # LMUL=1/4
   mf2  # LMUL=1/2
   m1   # LMUL=1, assumed if m setting absent
   m2   # LMUL=2
   m4   # LMUL=4
   m8   # LMUL=8
  
  Examples:
      vsetvli t0, a0, e8          # SEW= 8, LMUL=1
      vsetvli t0, a0, e8, m2      # SEW= 8, LMUL=2
      vsetvli t0, a0, e32, mf2    # SEW=32, LMUL=1/2
  ----


::

   vset{i}vli の即値vtipeiで提案されるアセンブラ名
  
   e8    # SEW=8b
   e16   # SEW=16b
   e32   # SEW=32b
   e64   # SEW=64b
   e128  # SEW=128b
   e256  # SEW=256b
   e512  # SEW=512b
   e1024 # SEW=1024b
  
   mf8  # LMUL=1/8
   mf4  # LMUL=1/4
   mf2  # LMUL=1/2
   m1   # LMUL=1, mの設定を省略した場合
   m2   # LMUL=2
   m4   # LMUL=4
   m8   # LMUL=8
  
  例:
      vsetvli t0, a0, e8          # SEW= 8, LMUL=1
      vsetvli t0, a0, e8, m2      # SEW= 8, LMUL=2
      vsetvli t0, a0, e32, mf2    # SEW=32, LMUL=1/2
  

..
  The `vsetvl` variant operates similarly to `vsetvli` except that it
  takes a `vtype` value from `rs2` and can be used for context restore.


`vsetvl` の動作は、`rs2` から `vtype` の値を受け取ることと、
コンテキストの復元に使えることを除いて、`vsetvli` と同様になります。

..
  If the `vtype` setting is not supported by the implementation, then
  the `vill` bit is set in `vtype`, the remaining bits in `vtype` are
  set to zero, and the `vl` register is also set to zero.


`vtype` の設定が実装でサポートされていない場合には、`vtype` に `vill` ビットが設定され、
`vtype` の残りのビットは 0 に設定され、`vl` レジスタも 0 に設定されます。

..
  NOTE: Earlier drafts required a trap when setting `vtype` to an
  illegal value.  However, this would have added the first
  data-dependent trap on a CSR write to the ISA.  Implementations may
  choose to trap when illegal values are written to `vtype` instead of
  setting `vill`, to allow emulation to support new configurations for
  forward-compatibility.  The current scheme supports light-weight
  runtime interrogation of the supported vector unit configurations by
  checking if `vill` is clear for a given setting.


.. note::

  しかし、これはISAへのCSRの書き込み時に最初のデータ依存の例外を追加することになりました。
  実装では、 `vill` を設定する代わりに、 `vtype` に不正な値が書き込まれたときに例外することを選択して、
  エミュレーションで新しい構成をサポートできるようにして、将来の互換性を確保することができます。
  現在のスキームは、ある設定に対して `vill` がクリアされているかどうかをチェックすることで、
  サポートされているベクトルユニットの設定をランタイムに軽く照会することをサポートしています。
  

*******************************
AVLのエンコーディング
*******************************

..
  The new vector
  length setting is based on AVL, which for `vsetvli` and `vsetvl` is encoded in the `rs1` and `rd`
  fields as follows:


新しいベクトル長の設定はAVLに基づいており、 `vsetvli` と `vsetvl` では、 `rs1` と `rd` のフィールドに以下のようにエンコードされます。

..
  .AVL used in `vsetvli` and `vsetvl` instructions
  [cols="2,2,10,10"]
  [%autowidth]
  |===
  |  `rd`  | `rs1`  | AVL value         | Effect on `vl`
  |  -     | !x0    | Value in `x[rs1]` | Normal stripmining
  | !x0    |  x0    | ~0                | Set `vl` to VLMAX
  |  x0    |  x0    | Value in `vl` register | Keep existing `vl` (of course, `vtype` may change)
  |===


..


..
+-----+-----+-----------------+------------------------------------------------+
| rd  | rs1 | AVL値           | vl を更新する                                  |
+=====+=====+=================+================================================+
| -   | !x0 | x[rs1] の値     | 通常のストリップマイニング                     |
+-----+-----+-----------------+------------------------------------------------+
| !x0 | x0  | ~0              | vl を VLMAXに設定する                          |
+-----+-----+-----------------+------------------------------------------------+
| x0  | x0  | vl レジスタの値 | vl をキープする(もちろん、 vtype は変更される) |
+-----+-----+-----------------+------------------------------------------------+


..
  When `rs1` is not `x0`, the AVL is an unsigned integer held in the `x`
  register specified by `rs1`, and the new `vl` value is also written to
  the `x` register specified by `rd`.


`rs1` が `x0` でない場合、AVL は `rs1` で指定された `x` レジスタに保持される符号なし整数となり、
新しい `vl` 値も `rd` で指定された `x` レジスタに書き込まれます。

..
  When `rs1=x0` but `rd!=x0`, the maximum unsigned integer value (`~0`)
  is used as the AVL, and the resulting VLMAX is written to `vl` and
  also to the `x` register specified by `rd`.


`rs1=x0` であるが、 `rd!=x0` のときは、符号なし整数の最大値 (`~0`) が AVL として使用され、
結果として VLMAX が `vl` に書き込まれ、 `rd` で指定された `x` レジスタにも書き込まれます。

..
  When `rs1=x0` and `rd=x0`, the instruction operates as if the current
  vector length in `vl` is used as the AVL, and the resulting value is
  written to `vl`, but not to a destination register.  This form can
  only be used when VLMAX and hence `vl` is not actually changed by the
  new SEW/LMUL ratio.  Use of the instruction with a new SEW/LMUL ratio
  that would result in a change of VLMAX is reserved.  Implementations
  may set `vill` in this case.


`rs1=x0` かつ `rd=x0` の場合、この命令は `vl` の現在のベクトル長をAVLとして使用し、
結果の値は `vl` に書き込まれますが、書き込みレジスタには書き込まれません。
この形式はVLMAXの場合にのみ使用でき、したがって `vl` は新しいSEW/LMUL比によって実際には変更されません。
VLMAXの変更をもたらすような新しいSEW/LMUL比率の命令の使用は予約済みです。
この場合、実装では `vill` を設定することができます。

..
  NOTE: This last form of the instructions allows the `vtype` register to
  be changed while maintaining the current `vl`, provided VLMAX is not
  reduced.  This design was chosen to ensure `vl` would always hold a
  legal value for current `vtype` setting.  The current `vl` value can
  be read from the `vl` CSR.  The `vl` value could be reduced by this
  instruction if the new SEW/LMUL ratio causes VLMAX to shrink, and so
  this case has been reserved as it is not clear this is a generally
  useful operation, and implementations can otherwise assume `vl` is not
  changed by this instruction to optimize their microarchitecture.


.. note::

  この設計は、現在の `vtype` 設定に対して `vl` が常に正当な値を保持するように選択されました。
  現在の `vl` 値は `vl` CSR から読み取ることができます。
  新しいSEW/LMUL比率によってVLMAXが縮小する場合、この命令によって `vl` 値が減少する可能性がありますが、
  これが一般的に有用な動作であることは明らかではないため、このケースは留保されています。
  また、実装は、マイクロアーキテクチャを最適化するために、この命令によって `vl` が変更されないと仮定することができます。
  
..
  For the `vsetivli` instruction, the AVL is encoded as a 5-bit
  zero-extended immediate (0--31) in the `rs1` field.


`vsetivli` 命令では、AVLは `rs1` フィールドの
5ビットのゼロ拡張即時値(0～31)としてエンコードされます。

..
  NOTE: The encoding of AVL for `vsetivli` is the same as for regular
  CSR immediate values.


.. note::

  
..
  NOTE: The `vsetivli` instruction provides more compact code when the
  dimensions of vectors are small, and known to fit inside the vector
  registers, so do not need stripmining overhead.


`vsetivli` 命令は、ベクトルサイズが小さく、ベクトルレジスタ内に収まることがわかっているため、
ストリップマイニングのオーバーヘッドが不要な場合に、よりコンパクトなコードを提供します。


*********************
`vl` 設定の制約
*********************

..
  The `vset{i}vl{i}` instructions first set VLMAX according to the `vtype`
  argument, then set `vl` obeying the following constraints:


`vset{i}vl{i}` 命令は、まず引数の `vtype` にしたがってVLMAXを設定し、
次に以下の制約にしたがって `vl` を設定します。

..
  * `vl = AVL` if `AVL {le} VLMAX`
  * `ceil(AVL / 2) {le} vl {le} VLMAX` if `AVL < (2 * VLMAX)`
  * `vl = VLMAX` if `AVL {ge} (2 * VLMAX)`
  * Deterministic on any given implementation for same input AVL and VLMAX values
  * These specific properties follow from the prior rules:
   - `vl = 0` if  `AVL = 0`
   - `vl > 0` if `AVL > 0`
   - `vl {le} VLMAX`
   - `vl {le} AVL`
   - a value read from `vl` when used as the AVL argument to `vset{i}vl{i}` results in the same
  value in `vl`, provided the resultant VLMAX equals the value of VLMAX at the time that `vl` was read


* `AVL {le} VLMAX` のときは `vl = AVL`
* `AVL W (2 * VLMAX)` のときは `ceil(AVL / ) {le} vl {le} VLMAX`
* `AVL {ge} (2 * VLMAX)` のときは `vl = VLMAX`
* 同一の入力AVLおよびVLMAX値に対して、任意の実装で決定される
* これらの具体的な特性は、事前のルールから導かれます。
 - `AVL = 0` なら `vl = 0`
 - `AVL > 0` なら `vl > 0`
 - `vl {le} VLMAX`
 - `vl {le} AVL`
 - `vl` から読み込んだ値を `vset{i}vl{i}` の AVL 引数として使用すると、`vl` 内の同じ値になります。 ただし、結果として得られる VLMAX が `vl` が読み込まれた時点での VLMAX の値と同じであることが条件です。

..
  [NOTE]
  --
  The `vl` setting rules are designed to be sufficiently strict to
  preserve `vl` behavior across register spills and context swaps for
  `AVL {le} VLMAX`, yet flexible enough to enable implementations to improve
  vector lane utilization for `AVL > VLMAX`.
  
  For example, this permits an implementation to set `vl = ceil(AVL / 2)`
  for `VLMAX < AVL < 2*VLMAX` in order to evenly distribute work over the
  last two iterations of a stripmine loop.
  Requirement 2 ensures that the first stripmine iteration of reduction
  loops uses the largest vector length of all iterations, even in the case
  of `AVL < 2*VLMAX`.
  This allows software to avoid needing to explicitly calculate a running
  maximum of vector lengths observed during a stripmined loop.
  Requirement 2 also allows an implementation to set vl to VLMAX for `VLMAX < AVL < 2*VLMAX`
  --


[NOTE]
--
`vl` の設定ルールは、`AVL {le} VLMAX` では、レジスタの流出やコンテキストの入れ替えがあっても `vl` の動作を維持できるよう、
十分に厳密に設計されています。しかし、 `AVL > VLMAX` の場合は、ベクトルレーンの使用率を向上させることができる柔軟性を備えています。
例えば、 `VLMAX < AVL < 2*VLMAX` の場合、`vl = ceil(AVL / 2)` と設定することで、ストリップマイニングループの最後の2回の反復に作業を均等に分散させることができます。
要件2は、 `AVL < 2*VLMAX` の場合であっても、リダクションループの最初のストリップミン反復では、
すべての反復の中で最大のベクトル長を使用することを保証する。
これにより、ソフトウェアは、ストリップマイニングループで観測されたベクトル長の実行最大値を明示的に計算する必要がなくなります。
また、要件2では、 `VLMAX < AVL < 2*VLMAX` の場合、vlをVLMAXに設定することができます。

--

.. _example-stripmine-sew:


*************************************************
ストリップマイニングとSEW変更の例
*************************************************

..
  The SEW and LMUL settings can be changed dynamically to provide high
  throughput on mixed-width operations in a single loop.


SEWとLMULの設定を動的に変更することで、1つのループで幅が混在するオペレーションでも高い処理能力を発揮します。
..
  ----
  # Example: Load 16-bit values, widen multiply to 32b, shift 32b result
  # right by 3, store 32b values.
  # On entry:
  #  a0 holds the total number of elements to process
  #  a1 holds the address of the source array
  #  a2 holds the address of the destination array
  
  loop:
      vsetvli a3, a0, e16, m4, ta, ma  # vtype = 16-bit integer vectors;
                                       # also update a3 with vl (# of elements this iteration)
      vle16.v v4, (a1)        # Get 16b vector
      slli t1, a3, 1          # Multiply # elements this iteration by 2 bytes/source element
      add a1, a1, t1          # Bump pointer
      vwmul.vx v8, v4, x10    # Widening multiply into 32b in <v8--v15>
  
      vsetvli x0, x0, e32, m8, ta, ma  # Operate on 32b values
      vsrl.vi v8, v8, 3
      vse32.v v8, (a2)        # Store vector of 32b elements
      slli t1, a3, 2          # Multiply # elements this iteration by 4 bytes/destination element
      add a2, a2, t1          # Bump pointer
      sub a0, a0, a3          # Decrement count by vl
      bnez a0, loop           # Any more?
  ----


::

  # 例: 16ビットの値をロードし、32ビットに拡張し乗算を行い、
  # 結果の32ビットを3ビット右にシフトし、結果の32ビットをストアする
  # プログラムの先頭では:
  #  a0は処理を行う全体の要素数を保持している
  #  a1はソース配列のアドレスを保持している
  #  a2は書き込み配列のアドレスを保持している
  loop:
      vsetvli a3, a0, e16, m4, ta, ma  # vtype = 16-bit 整数ベクトル
                                       # a3をvlに更新する(個のイタレーションにおける要素の個数)
      vle16.v v4, (a1)        # 16ビットのベクトルを取得する
      slli t1, a3, 1          # このイタレーションで処理する要素の個数から要素の2バイト数を計算する
      add a1, a1, t1          # ポインタを進める
      vwmul.vx v8, v4, x10    # v4の値を32ビットに拡張して<v8--v15>に格納する
  
      vsetvli x0, x0, e32, m8, ta, ma  # 32ビット操作に変更する
      vsrl.vi v8, v8, 3
      vse32.v v8, (a2)        # 32ビットの値をベクトルにストアする
      slli t1, a3, 2          # このイタレーションで処理する要素の個数から要素の4バイト数を計算する
      add a2, a2, t1          # ポインタを進める
      sub a0, a0, a3          # vlだけカウンタを減少させる
      bnez a0, loop           # これ以上処理する？
  

.. _sec-vector-memory:

########################################
ベクトルロード・ストア命令
########################################


..
  Vector loads and stores move values between vector registers and
  memory.  Vector loads and stores are masked and do not raise
  exceptions on inactive elements.  Masked vector loads do not update
  inactive elements in the destination vector register group, unless
  masked agnostic is specified (`vtype.vma`=1).  Masked vector stores
  only update active memory elements.  All vector loads and stores may
  generate and accept a non-zero `vstart` value.


ベクトルのロードとストアは、ベクトルレジスタとメモリの間で値を移動させます。
ベクトルロードとストアはマスクされ、非アクティブな要素で例外を発生させません。
マスク付きベクトルロード命令は、マスク付きAgnostics (`vtype.vma`=1) が指定されていない限り、
書き込みベクトルレジスタグループの非アクティブな要素を更新しません。
マスク付きベクトルストアは、アクティブなメモリ要素のみを更新します。
すべてのベクトルロードとストアは、ゼロではない `vstart` 値を生成して受け入れることができます。


****************************************************************
ベクトルロードストア命令のエンコーディング
****************************************************************

..
  Vector loads and stores are encoded within the scalar floating-point
  load and store major opcodes (LOAD-FP/STORE-FP).  The vector load and
  store encodings repurpose a portion of the standard scalar
  floating-point load/store 12-bit immediate field to provide further
  vector instruction encoding, with bit 25 holding the standard vector
  mask bit (see :ref:`sec-vector-mask-encoding` ).


include::vmem-format.adoc[]

..
  [cols="4,12"]
  |===
  | Field      | Description
  
  | rs1[4:0]   | specifies x register holding base address
  | rs2[4:0]   | specifies x register holding stride
  | vs2[4:0]   | specifies v register holding address offsets
  | vs3[4:0]   | specifies v register holding store data
  | vd[4:0]    | specifies v register destination of load
  | vm         | specifies whether vector masking is enabled (0 = mask enabled, 1 = mask disabled)
  | width[2:0] | specifies size of memory elements, and distinguishes from FP scalar
  | mew        | extended memory element width. See :ref:`sec-vector-loadstore-width-encoding` 
  | mop[1:0]   | specifies memory addressing mode
  | nf[2:0]    | specifies the number of fields in each segment, for segment load/stores
  | lumop[4:0]/sumop[4:0] | are additional fields encoding variants of unit-stride instructions
  |===


..


..
  Vector memory unit-stride and constant-stride operations directly
  encode EEW of the data to be transferred statically in the instruction
  to reduce the number of `vtype` changes when accessing memory in a
  mixed-width routine.  Indexed operations use the explicit EEW encoding
  in the instruction to set the size of the indices used, and use
  SEW/LMUL to specify the data width.


ベクトルメモリユニットストライドおよびコンスタントストライド操作では、転送するデータのEEWを命令内で直接静的にエンコードすることで、
幅が混在するルーチンでメモリをアクセスする際の `vtype` の変更回数を減らすことができます。
インデックス付き演算では、命令内の明示的なEEWエンコーディングにより、
使用するインデックスのサイズを設定し、SEW/LMULによりデータ幅を指定します。


*************************************************************
ベクトルロードストアアドレッシングモード
*************************************************************

..
  The vector extension supports unit-stride, strided, and
  indexed (scatter/gather) addressing modes.  Vector load/store base
  registers and strides are taken from the GPR `x` registers.


ベクトル拡張は、ユニットストライド、ストライド、インデックス(スキャッター/ギャザー)のアドレッシングモードをサポートしています。
ベクトルのロード/ストアベースレジスタとストライドは、GPRの `x` レジスタから取得されます。

..
  The base effective address for all vector accesses is given by the
  contents of the `x` register named in `rs1`.


すべてのベクトルアクセスのベースとなる実効アドレスは、
`rs1` で指定された `x` レジスタの内容で与えられます。

..
  Vector unit-stride operations access elements stored contiguously in
  memory starting from the base effective address.


ベクトルのユニットストライド演算は、ベース有効アドレスからメモリ内に連続して格納されている要素をアクセスします。

..
  Vector constant-strided operations access the first memory element at the base
  effective address, and then access subsequent elements at address
  increments given by the byte offset contained in the `x` register
  specified by `rs2`.


ベクトルの定数ストライド演算では、ベース実効アドレスで最初のメモリ要素をアクセスし、
その後、 `rs2` で指定される `x` レジスタに含まれるバイトオフセットで与えられるアドレス増分で後続の要素をアクセスします。

..
  Vector indexed operations add the contents of each element of the
  vector offset operand specified by `vs2` to the base effective address
  to give the effective address of each element.  The data vector
  register group has EEW=SEW, EMUL=LMUL, while the offset vector
  register group has EEW encoding in the instruction and
  EMUL=(EEW/SEW)*LMUL.


ベクトルインデックス演算では、 `vs2` で指定されたベクトルオフセットオペランドの各要素の内容をベース実効アドレスに加算し、
各要素の実効アドレスを得ることができます。
データベクトルレジスタ群は、EEW=SEW、EMUL=LMUL、オフセットベクトルレジスタ群は、
命令内でEEW符号化、EMUL=(EEW/SEW)*LMULとなります。

..
  The vector offset operand is treated as a vector of byte-address
  offsets.


ベクトル・オフセット・オペランドは、バイトアドレス・オフセットのベクトルとして扱われます。

..
  NOTE: The indexed operations can also be used to access fields within
  a vector of objects, where the `vs2` vector holds pointers to the base
  of the objects and the scalar `x` register holds the offset of the
  member field in each object.  Supporting this case is why the indexed
  operations were not defined to scale the element indices by the data
  EEW.


.. note::

  この場合、`vs2` ベクトルはオブジェクトのベースへのポインタを保持し、スカラ `x` レジスタは各オブジェクト内のメンバ・フィールドのオフセットを保持します。
  このケースをサポートすることが、要素のインデックスをデータEEWでスケーリングするインデックス付き演算が定義されなかった理由です。
  
..
  If the vector offset elements are narrower than XLEN, they are
  zero-extended to XLEN before adding to the base effective address.  If
  the vector offset elements are wider than XLEN, the least-significant
  XLEN bits are used in the address calculation.  An implementation can
  raise an illegal instruction exception if the EEW is not supported for
  offset elements.


ベクトルオフセットの要素がXLENより狭い場合は、ベースの実効アドレスに追加する前にXLENまでゼロ拡張されます。
ベクトルオフセット要素の幅がXLENよりも広い場合は、アドレスの計算に最下位のXLENビットが使用されます。
EEWがオフセット要素に対してサポートされていない場合、実装は不正な命令例外を発生させることができます。

..
  NOTE: A profile may place an upper limit on the maximum supported index
  EEW (e.g., only up to XLEN) smaller than ELEN.


.. note::

  
..
  The vector addressing modes are encoded using the 2-bit `mop[1:0]`
  field.


ベクトルアドレッシングモードは、2ビットの `mop[1:0]` フィールドを使ってエンコードされます。

..
  .encoding for loads
  [cols="1,1,7,6"]
  |===
  2+| mop [1:0] | Description | Opcodes
  
  | 0 | 0 | unit-stride       | VLE<EEW>
  | 0 | 1 | indexed-unordered | VLUXEI<EEW>
  | 1 | 0 | strided           | VLSE<EEW>
  | 1 | 1 | indexed-ordered   | VLOXEI<EEW>
  |===


..


..
  .encoding for stores
  [cols="1,1,7,6"]
  |===
  2+| mop [1:0] | Description | Opcodes
  
  | 0 | 0 | unit-stride       | VSE<EEW>
  | 0 | 1 | indexed-unordered | VSUXEI<EEW>
  | 1 | 0 | strided           | VSSE<EEW>
  | 1 | 1 | indexed-ordered   | VSOXEI<EEW>
  |===


..


..
  Vector unit-stride and constant-stride memory accesses do not
  guarantee ordering between individual element accesses.  The vector
  indexed load and store memory operations have two forms, ordered and
  unordered.  The indexed-ordered variants preserve element ordering on
  memory accesses.


ベクトルのユニットストライドおよびコンスタントストライドのメモリアクセスでは、個々の要素のアクセス間の順序付けは保証されません。
ベクトルのインデックス付きロードおよびストアのメモリ操作には、順序付きと非順序付きの2つの形式があります。
indexed-ordered の場合は、メモリアクセス時の要素の順序を保持します。

..
  For unordered instructions (`mop`!=11) there is no guarantee on
  element access order.  If the accesses are to a strongly ordered IO
  region, the element accesses can be initiated in any order.


順序なしの命令(`mop`!=11)では、要素アクセスの順序は保証されません。
強く順序付けられたIO領域へのアクセスの場合、要素のアクセスはどのような順序でも開始できます。

..
  NOTE: To provide ordered vector accesses to a strongly ordered IO
  region, the ordered indexed instructions should be used.


.. note::

  順序付けられたインデックス付き命令を使用する必要があります。
  
..
  For implementations with precise vector traps, exceptions on
  indexed-unordered stores must also be precise.


精密なベクトル例外を備えた実装では、
インデックス付きの非順序型ストアの例外も精密でなければなりません。

..
  Additional unit-stride vector addressing modes are encoded using the
  5-bit `lumop` and `sumop` fields in the unit-stride load and store
  instruction encodings respectively.


追加のユニットストライドベクトルアドレッシングモードは、
ユニットストライドロード命令およびストア命令のエンコーディングにおいて、
それぞれ5ビットの `lumop` および `sumop` フィールドにエンコードされます。

..


..


..
  The `nf[2:0]` field encodes the number of fields in each segment.  For
  regular vector loads and stores, `nf`=0, indicating that a single
  value is moved between a vector register group and memory at each
  element position.  Larger values in the `nf` field are used to access
  multiple contiguous fields within a segment as described below in
  Section :ref:`sec-aos` .


`nf[2:0]` フィールドは、各セグメントのフィールドの数をエンコードします。
通常のベクトルロード/ストアでは、 `nf`=0で、各要素の位置でベクトルレジスタグループとメモリーの間で1つの値が移動することを示します。
より大きな値の `nf` フィールドは、後述の :ref:`sec-aos`  項で説明するように、
セグメント内の複数の連続したフィールドにアクセスするために使用されます。

..
  NOTE: The `nf` field for segment load/stores has replaced the use of
  the same bits for an address offset field.  The offset can be replaced
  with a single scalar integer calculation, while segment load/stores
  add more powerful primitives to move items to and from memory.


.. note::

  アドレスオフセットフィールドのための同じビットの使用に取って代わりました。
  セグメントロード/ストアは、メモリとの間でアイテムを移動するためのより強力なプリミティブを追加する一方で、
  オフセットは単一のスカラ整数の計算で置き換えることができます。
  
..
  The `nf[2:0]` field also encodes the number of whole vector registers
  to transfer for the whole vector register load/store instructions.


また、 `nf[2:0]` フィールドは、全体ベクトルレジスタの
ロード/ストア命令で転送する全体ベクトルレジスタの数をエンコードします。

.. _sec-vector-loadstore-width-encoding:

*************************************************************
ベクトルロードストア幅のエンコーディング
*************************************************************

..
  Vector loads and stores have an EEW encoded directly in the
  instruction.  The corresponding EMUL is calculated as EMUL =
  (EEW/SEW)*LMUL. If the EMUL would be out of range (EMUL>8 or
  EMUL<1/8), the instruction encoding is reserved.  The vector register
  groups must have legal register specifiers for the selected EMUL;
  the instruction encoding is otherwise considered reserved.


ベクトルのロードとストアは、命令に直接EEWがエンコードされています。
対応するEMULはEMUL = (EEW/SEW)*LMULとして計算されます。
EMULが範囲外(EMUL>8またはEMUL<1/8)になる場合は、命令のエンコーディングは予約されます。
ベクトルレジスタグループには、選択されたEMULに対応する正当なレジスタ指定子がなければならず、
それ以外の命令エンコーディングは予約済みとみなされます。

..
  Vector unit-stride and constant-stride use the EEW/EMUL encoded in the
  instruction for the data values, while vector indexed loads and stores
  use the EEW/EMUL encoded in the instruction for the index values and
  the SEW/LMUL encoded in `vtype` for the data values.


ベクトルユニットストライドおよび定数ストライドでは、データ値に命令でエンコードされたEEW/EMULを使用し、
ベクトルインデックスロードおよびストアでは、インデックス値に命令でエンコードされたEEW/EMULを使用し、
データ値に `vtype` でエンコードされたSEW/LMULを使用します。

..
  Vector loads and stores are encoded using width values that are not
  claimed by the standard scalar floating-point loads and stores.


ベクトルのロード・ストアは、標準的なスカラ浮動小数点の
ロード・ストアが明示していない幅の値を使ってエンコードされます。

..
  The `mew` bit (`inst[28]`) is expected to be used to encode expanded
  memory sizes of 128 bits and above, but these encodings are *reserved*
  at this point.


`mew` ビット(`inst[28]`)は、128ビット以上の拡張メモリサイズをエンコードするために使用されることが期待されますが、
これらのエンコードは現時点では *予約* です。

..
  Vector loads and stores for EEWs of all supported SEW settings must be
  provided in an implementation.  Vector load/store encodings for
  unsupported EEW widths are reserved.


サポートされているすべてのSEW設定のEEWのベクトルロード/ストアは、実装で提供されなければなりません。
サポートされていないEEW幅のベクトルロード・ストアエンコーディングは予約済みです。


..


..
  Mem bits is the size of each element accessed in memory.


Mem bits は、メモリにアクセスされる各要素のサイズです。

..
  Data reg bits is the size of each data element accessed in register.


Data reg bits は、レジスタにアクセスされる各データ要素のサイズです。

..
  Index bits is the size of each index accessed in register.


Index bits は、レジスタにアクセスされる各インデックスのサイズです。

..
  Data and index bit EEW encodings larger than 64b are currently reserved.


64b以上のデータおよびインデックスビットのEEWエンコーディングは現在予約済みです。

..
  NOTE: RV128 will require data and index EEW of 128.


.. note::

  
**********************************************
ベクトルユニットストライド命令
**********************************************

..
  ----
      # Vector unit-stride loads and stores
  
      # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>)
      vle8.v    vd, (rs1), vm  #    8-bit unit-stride load
      vle16.v   vd, (rs1), vm  #   16-bit unit-stride load
      vle32.v   vd, (rs1), vm  #   32-bit unit-stride load
      vle64.v   vd, (rs1), vm  #   64-bit unit-stride load
      # vle128.v  vd, (rs1), vm  #  128-bit unit-stride load. Reserved
      # vle256.v  vd, (rs1), vm  #  256-bit unit-stride load. Reserved
      # vle512.v  vd, (rs1), vm  #  512-bit unit-stride load. Reserved
      # vle1024.v vd, (rs1), vm  # 1024-bit unit-stride load. Reserved
  
      # vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>)
      vse8.v    vs3, (rs1), vm  #    8-bit unit-stride store
      vse16.v   vs3, (rs1), vm  #   16-bit unit-stride store
      vse32.v   vs3, (rs1), vm  #   32-bit unit-stride store
      vse64.v   vs3, (rs1), vm  #   64-bit unit-stride store
      # vse128.v  vs3, (rs1), vm  #  128-bit unit-stride store. Reserved
      # vse256.v  vs3, (rs1), vm  #  256-bit unit-stride store. Reserved
      # vse512.v  vs3, (rs1), vm  #  512-bit unit-stride store. Reserved
      # vse1024.v vs3, (rs1), vm  # 1024-bit unit-stride store. Reserved
  ----


::

      # ベクトルユニットストライドロードストア命令
  
      # vd は書き込みレジスタ, rs1 はベースアドレス, vm はマスクエンコーディング (v0.t or <missing>)
      vle8.v    vd, (rs1), vm  #    8-bit ユニットストライドロード
      vle16.v   vd, (rs1), vm  #   16-bit ユニットストライドロード
      vle32.v   vd, (rs1), vm  #   32-bit ユニットストライドロード
      vle64.v   vd, (rs1), vm  #   64-bit ユニットストライドロード
      # vle128.v  vd, (rs1), vm  #  128-bit ユニットストライドロード、予約済み
      # vle256.v  vd, (rs1), vm  #  256-bit ユニットストライドロード、予約済み
      # vle512.v  vd, (rs1), vm  #  512-bit ユニットストライドロード、予約済み
      # vle1024.v vd, (rs1), vm  # 1024-bit ユニットストライドロード、予約済み
  
      # vs3はストアデータ, rs1はベースアドレス, vmはマスクエンコーディング (v0.t or <missing>)
      vse8.v    vs3, (rs1), vm  #    8-bit ユニットストライドストア
      vse16.v   vs3, (rs1), vm  #   16-bit ユニットストライドストア
      vse32.v   vs3, (rs1), vm  #   32-bit ユニットストライドストア
      vse64.v   vs3, (rs1), vm  #   64-bit ユニットストライドストア
      # vse128.v  vs3, (rs1), vm  #  128-bit ユニットストライドストア、予約済み
      # vse256.v  vs3, (rs1), vm  #  256-bit ユニットストライドストア、予約済み
      # vse512.v  vs3, (rs1), vm  #  512-bit ユニットストライドストア、予約済み
      # vse1024.v vs3, (rs1), vm  # 1024-bit ユニットストライドストア、予約済み
  

..
  An additional unit-stride load and store is provided to support
  transferring mask values to/from memory.  These operate the
  same as unmasked byte loads or stores (EEW=8), except that the effective
  vector length is ``evl``=ceil(``vl``/8) (i.e. EMUL=1), and the destination register is
  always written with a tail-agnostic policy.


マスク値をメモリとの間で転送するために、ユニットストライドのロード/ストアが追加されました。
これは、実効ベクトル長が ``evl``=ceil(``vl``/8) (つまりEMUL=1)であることと、
転送先のレジスタが常にテールアグノスティックポリシで書き込まれることを除けば、
マスクなしのバイトロード/ストア(EEW=8)と同じ動作をします。

..
  ----
      # Vector unit-stride mask load
      vlm.v vd, (rs1)   #  Load byte vector of length ceil(vl/8)
  
      # Vector unit-stride mask store
      vsm.v vs3, (rs1)  #  Store byte vector of length ceil(vl/8)
  ----


::

      # ベクトルユニットストライドマスクロード命令
      vlm.v vd, (rs1)   #  ceil(vl/8)バイト長だけバイトデータをロードする
  
      # ベクトルユニットストライドマスクストア命令
      vsm.v vs3, (rs1)  #  ceil(vl/8)バイト長だけバイトデータをストアする
  

..
  `vlm.v` and `vsm.v` are encoded with `width[2:0]`=0, like
  `vle8.v` and `vse8.v`; they are distinguished by different
  `lumop` and `sumop` encodings.  Since `vlm.v` and `vsm.v` operate as byte loads and stores,
  `vstart` is in units of bytes for these instructions.


`vlm.v` と `vsm.v` は、`vle8.v` と `vse8.v` と同様に `width[2:0]`=0 でエンコードされますが、
`lumop` と `sumop` のエンコードが異なることで区別されます。
`vlm.v` と `vsm.v` はバイトロードとストアとして動作するので、
これらの命令では `vstart` はバイト単位になります。

..
  NOTE: The previous assembler mnemonics `vle1.v` and `vse1.v` were
  confusing as length was handled different for these instructions
  versus other element load/store instructions.  To avoid software
  churn, these older assembly mnemonics are being retained as aliases.


.. note::

  これらの命令と他の要素のロード/ストア命令で長さの扱いが異なるため、混乱を招きました。
  ソフトウェアの混乱を避けるために、これらの古いアセンブラニーモニックはエイリアスとして残されています。
  
..
  NOTE: The primary motivation to provide mask load and store is to
  support machines that internally rearrange data to reduce
  cross-datapath wiring.  However, this also provides a convenient
  mechanism to access packed bit vectors in memory as mask registers,
  and reduces the cost of mask spill/fill by reducing need to change
  `vl`.


.. note::

  データパス間の配線を減らすためにデータを内部で再配置するマシンをサポートすることです。
  また、 `vl` を変更する必要がないため、マスクのスピル/フィルのコストを削減することができます。
  

**********************************
ベクトルストライド命令
**********************************

..
  ----
      # Vector strided loads and stores
  
      # vd destination, rs1 base address, rs2 byte stride
      vlse8.v    vd, (rs1), rs2, vm  #    8-bit strided load
      vlse16.v   vd, (rs1), rs2, vm  #   16-bit strided load
      vlse32.v   vd, (rs1), rs2, vm  #   32-bit strided load
      vlse64.v   vd, (rs1), rs2, vm  #   64-bit strided load
      # vlse128.v  vd, (rs1), rs2, vm  #  128-bit strided load. Reserved
      # vlse256.v  vd, (rs1), rs2, vm  #  256-bit strided load. Reserved
      # vlse512.v  vd, (rs1), rs2, vm  #  512-bit strided load. Reserved
      # vlse1024.v vd, (rs1), rs2, vm  # 1024-bit strided load. Reserved
  
      # vs3 store data, rs1 base address, rs2 byte stride
      vsse8.v    vs3, (rs1), rs2, vm  #    8-bit strided store
      vsse16.v   vs3, (rs1), rs2, vm  #   16-bit strided store
      vsse32.v   vs3, (rs1), rs2, vm  #   32-bit strided store
      vsse64.v   vs3, (rs1), rs2, vm  #   64-bit strided store
      # vsse128.v  vs3, (rs1), rs2, vm  #  128-bit strided store. Reserved
      # vsse256.v  vs3, (rs1), rs2, vm  #  256-bit strided store. Reserved
      # vsse512.v  vs3, (rs1), rs2, vm  #  512-bit strided store. Reserved
      # vsse1024.v vs3, (rs1), rs2, vm  # 1024-bit strided store. Reserved
  ----


::

      # ベクトルストライドロードストア命令
  
      # vdは書き込みレジスタ, rs1はベースレジスタ, rs2はバイトストライド
      vlse8.v    vd, (rs1), rs2, vm  #    8-bit ストライドロード命令
      vlse16.v   vd, (rs1), rs2, vm  #   16-bit ストライドロード命令
      vlse32.v   vd, (rs1), rs2, vm  #   32-bit ストライドロード命令
      vlse64.v   vd, (rs1), rs2, vm  #   64-bit ストライドロード命令
      # vlse128.v  vd, (rs1), rs2, vm  #  128-bit ストライドロード命令、予約済み
      # vlse256.v  vd, (rs1), rs2, vm  #  256-bit ストライドロード命令、予約済み
      # vlse512.v  vd, (rs1), rs2, vm  #  512-bit ストライドロード命令、予約済み
      # vlse1024.v vd, (rs1), rs2, vm  # 1024-bit ストライドロード命令、予約済み
  
      # vs3 store data, rs1 base address, rs2 byte stride
      vsse8.v    vs3, (rs1), rs2, vm  #    8-bit ストライドストア命令
      vsse16.v   vs3, (rs1), rs2, vm  #   16-bit ストライドストア命令
      vsse32.v   vs3, (rs1), rs2, vm  #   32-bit ストライドストア命令
      vsse64.v   vs3, (rs1), rs2, vm  #   64-bit ストライドストア命令
      # vsse128.v  vs3, (rs1), rs2, vm  #  128-bit ストライドストア命令、予約済み
      # vsse256.v  vs3, (rs1), rs2, vm  #  256-bit ストライドストア命令、予約済み
      # vsse512.v  vs3, (rs1), rs2, vm  #  512-bit ストライドストア命令、予約済み
      # vsse1024.v vs3, (rs1), rs2, vm  # 1024-bit ストライドストア命令、予約済み
  

..
  Negative and zero strides are supported.


負の数のストライドとゼロストライドに対応しています。

..
  Element accesses within a strided instruction are unordered with
  respect to each other.


ストライド命令内の要素アクセスは、互いに順序付けられていません。

..
  When `rs2`=`x0`, then an implementation is allowed, but not required,
  to perform fewer memory operations than the number of active elements,
  and may perform different numbers of memory operations across
  different dynamic executions of the same static instruction.


`rs2`=`x0` の場合、実装はアクティブな要素の数よりも少ないメモリ操作を実行することが許されますが、必須ではありません。
また、同じ静的に同じ命令の異なる動的な実行において、異なる数のメモリ操作を実行することができます。

..
  NOTE: Compilers must be aware to not use the `x0` form for rs2 when
  the immediate stride is `0` if the intent to is to require all memory
  accesses are performed.


.. note::

  即時ストライドが `0` のときに rs2 に `x0` 形式を使用しないように、
  コンパイラーは注意しなければなりません。
  
..
  When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must
  perform one memory access for each active element (but these accesses
  will not be ordered).


`rs2!=x0` で `x[rs2]=0` の場合、実装はアクティブな要素ごとに1回のメモリアクセスを
行わなければなりません(ただし、これらのアクセスは順序付けられません)。

..
  NOTE: When repeating ordered vector accesses to the same memory
  address are required, then an ordered indexed operation can be used.


.. note::

  順序付きインデックス操作を使用することができます。
  

*************************************
ベクトルインデックス命令
*************************************

..
  ----
      # Vector indexed loads and stores
  
      # Vector indexed-ordered load instructions
      # vd destination, rs1 base address, vs2 indices
      vluxei8.v    vd, (rs1), vs2, vm  # unordered  8-bit indexed load of SEW data
      vluxei16.v   vd, (rs1), vs2, vm  # unordered 16-bit indexed load of SEW data
      vluxei32.v   vd, (rs1), vs2, vm  # unordered 32-bit indexed load of SEW data
      vluxei64.v   vd, (rs1), vs2, vm  # unordered 64-bit indexed load of SEW data
  
      # Vector indexed-ordered load instructions
      # vd destination, rs1 base address, vs2 indices
      vloxei8.v    vd, (rs1), vs2, vm  # ordered  8-bit indexed load of SEW data
      vloxei16.v   vd, (rs1), vs2, vm  # ordered 16-bit indexed load of SEW data
      vloxei32.v   vd, (rs1), vs2, vm  # ordered 32-bit indexed load of SEW data
      vloxei64.v   vd, (rs1), vs2, vm  # ordered 64-bit indexed load of SEW data
  
      # Vector indexed-unordered store instructions
      # vs3 store data, rs1 base address, vs2 indices
      vsuxei8.v   vs3, (rs1), vs2, vm # unordered  8-bit indexed store of SEW data
      vsuxei16.v  vs3, (rs1), vs2, vm # unordered 16-bit indexed store of SEW data
      vsuxei32.v  vs3, (rs1), vs2, vm # unordered 32-bit indexed store of SEW data
      vsuxei64.v  vs3, (rs1), vs2, vm # unordered 64-bit indexed store of SEW data
  
      # Vector indexed-ordered store instructions
      # vs3 store data, rs1 base address, vs2 indices
      vsoxei8.v    vs3, (rs1), vs2, vm  # ordered  8-bit indexed store of SEW data
      vsoxei16.v   vs3, (rs1), vs2, vm  # ordered 16-bit indexed store of SEW data
      vsoxei32.v   vs3, (rs1), vs2, vm  # ordered 32-bit indexed store of SEW data
      vsoxei64.v   vs3, (rs1), vs2, vm  # ordered 64-bit indexed store of SEW data
  
  ----


::

      # ベクトルインデックスロードストア命令
  
      # ベクトル順序無しインデックスロード命令
      # vdは書き込みベクトルレジスタ, rs1はベースアドレス, vs2はインデックス
      vluxei8.v    vd, (rs1), vs2, vm  # unordered  8-bit SEWデータのインデックスロード命令
      vluxei16.v   vd, (rs1), vs2, vm  # unordered 16-bit SEWデータのインデックスロード命令
      vluxei32.v   vd, (rs1), vs2, vm  # unordered 32-bit SEWデータのインデックスロード命令
      vluxei64.v   vd, (rs1), vs2, vm  # unordered 64-bit SEWデータのインデックスロード命令
  
      # ベクトル順序付きインデックスロード命令
      # vdは書き込みベクトルレジスタ, rs1はベースアドレス, vs2はインデックス
      vloxei8.v    vd, (rs1), vs2, vm  # ordered  8-bit SEWデータのインデックスロード命令
      vloxei16.v   vd, (rs1), vs2, vm  # ordered 16-bit SEWデータのインデックスロード命令
      vloxei32.v   vd, (rs1), vs2, vm  # ordered 32-bit SEWデータのインデックスロード命令
      vloxei64.v   vd, (rs1), vs2, vm  # ordered 64-bit SEWデータのインデックスロード命令
  
      # ベクトル順序無しインデックスストア命令
      # vs3はストアデータ, rs1はベースアドレス, vs2はインデックス
      vsuxei8.v   vs3, (rs1), vs2, vm # unordered  8-bit SEWデータのインデックスストア命令
      vsuxei16.v  vs3, (rs1), vs2, vm # unordered 16-bit SEWデータのインデックスストア命令
      vsuxei32.v  vs3, (rs1), vs2, vm # unordered 32-bit SEWデータのインデックスストア命令
      vsuxei64.v  vs3, (rs1), vs2, vm # unordered 64-bit SEWデータのインデックスストア命令
  
      # ベクトル順序付きインデックスストア命令
      # vs3はストアデータ, rs1はベースアドレス, vs2はインデックス
      vsoxei8.v    vs3, (rs1), vs2, vm  # ordered  8-bit SEWデータのインデックスストア命令
      vsoxei16.v   vs3, (rs1), vs2, vm  # ordered 16-bit SEWデータのインデックスストア命令
      vsoxei32.v   vs3, (rs1), vs2, vm  # ordered 32-bit SEWデータのインデックスストア命令
      vsoxei64.v   vs3, (rs1), vs2, vm  # ordered 64-bit SEWデータのインデックスストア命令
  
  
..
  NOTE: The assembler syntax for indexed loads and stores uses
  ``ei``**x** instead of ``e``**x** to indicate the statically encoded EEW
  is of the index not the data.


.. note::

  静的にエンコードされたEEWがデータではなくインデックスであることを示すために、
  ``e``**x** の代わりに ``ei``**x** を使用しています。
  
..
  NOTE: The indexed operations mnemonics have a "U" or "O" to
  distinguish between unordered and ordered, while the other vector
  addressing modes have no character. While this is perhaps a little
  less consistent, this approach minimizes disruption to existing
  software, as VSXEI previously meant "ordered" - and the opcode can be
  retained as an alias during transition to help reduce software churn.


.. note::

  他のベクトルアドレッシングモードには文字がありません。
  この方法は、一貫性に欠けるかもしれませんが、VSXEIが以前は"順序付き"を意味していたように、
  既存のソフトウェアの混乱を最小限に抑えることができます。
  また、移行中にオペコードを別名として保持することで、ソフトウェアの解約を減らすことができます。
  

*****************************************************
ユニットストライドFault-Only-Firstロード
*****************************************************

..
  The unit-stride fault-only-first load instructions are used to
  vectorize loops with data-dependent exit conditions ("while" loops).
  These instructions execute as a regular load except that they will
  only take a trap caused by a synchronous exception on element 0.  If
  element 0 raises an exception, `vl` is not modified, and the trap is
  taken.  If an element > 0 raises an exception, the corresponding trap
  is not taken, and the vector length `vl` is reduced to the index of
  the element that would have raised an exception.


ユニットストライドfault-only-first load命令は、データに依存した終了条件を持つループ("while "ループ)をベクトル化するために使用されます。
これらの命令は、要素0で発生した同期例外による例外のみを取得することを除いて、
通常のロードと同様に実行されます。 要素 > 0 で例外が発生した場合、対応する例外は取られず、
ベクトル長 `vl` は例外が発生したであろう要素のインデックスに縮小されます。

..
  Load instructions may overwrite active destination vector register
  group elements past the element index at which the trap is reported.
  Similarly, fault-only-first load instructions may update destination
  elements past the element that causes trimming of the vector length
  (but not past the original vector length).  The values of these
  spurious updates do not have to correspond to the values in memory at
  the addressed memory locations.  Non-idempotent memory locations can
  only be accessed when it is known the corresponding element load
  operation will not be restarted due to a trap or vector length
  trimming.


ロード命令は、例外が報告された要素のインデックスを超えて、アクティブな書き込みベクトルレジスタグループの要素を上書きする可能性があります。
同様に、fault-only-firstのロード命令では、書き込みレジスタの要素が、ベクトルの長さを切り詰める原因となる要素を超えて更新されることがあります(ただし、元のベクトルの長さを超えることはありません)。
これらの疑似更新の値は、アドレスされたメモリ位置のメモリ内の値と一致する必要はありません。
非占有のメモリ位置は、対応する要素のロード操作が例外やベクトル長のトリミングによって再起動されないことがわかっている場合にのみアクセスできます。

..
  ----
      # Vector unit-stride fault-only-first loads
  
      # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>)
      vle8ff.v    vd, (rs1), vm  #    8-bit unit-stride fault-only-first load
      vle16ff.v   vd, (rs1), vm  #   16-bit unit-stride fault-only-first load
      vle32ff.v   vd, (rs1), vm  #   32-bit unit-stride fault-only-first load
      vle64ff.v   vd, (rs1), vm  #   64-bit unit-stride fault-only-first load
      # vle128ff.v  vd, (rs1), vm  #  128-bit unit-stride fault-only-first load. Reserved
      # vle256ff.v  vd, (rs1), vm  #  256-bit unit-stride fault-only-first load. Reserved
      # vle512ff.v  vd, (rs1), vm  #  512-bit unit-stride fault-only-first load. Reserved
      # vle1024ff.v vd, (rs1), vm  # 1024-bit unit-stride fault-only-first load. Reserved
  ----


::

      # ベクトルユニットストライドfault-only-firstロード
  
      # vdは書き込みレジスタ, rs1はベースアドレス, vmはマスクエンコーディング (v0.t もしくは<指定なし>)
      vle8ff.v    vd, (rs1), vm  #    8-bit ユニットストライドfault-only-firstロード
      vle16ff.v   vd, (rs1), vm  #   16-bit ユニットストライドfault-only-firstロード
      vle32ff.v   vd, (rs1), vm  #   32-bit ユニットストライドfault-only-firstロード
      vle64ff.v   vd, (rs1), vm  #   64-bit ユニットストライドfault-only-firstロード
      # vle128ff.v  vd, (rs1), vm  #  128-bit ユニットストライドfault-only-firstロード(予約)
      # vle256ff.v  vd, (rs1), vm  #  256-bit ユニットストライドfault-only-firstロード(予約)
      # vle512ff.v  vd, (rs1), vm  #  512-bit ユニットストライドfault-only-firstロード(予約)
      # vle1024ff.v vd, (rs1), vm  # 1024-bit ユニットストライドfault-only-firstロード(予約)
  

..
  ----
  strlen example using unit-stride fault-only-first instruction
  
  include::example/strlen.s[lines=4..-1]
  ----


::

  ユニットストライドfault-only-first命令によるstrlenの例
  
  include::example/strlen.s[lines=4..-1]
  

..
  NOTE: There is a security concern with fault-on-first loads, as they
  can be used to probe for valid effective addresses.  Strided and
  scatter/gather fault-only-first instructions are not provided due to
  lack of encoding space, and they can also represent a larger security
  hole, allowing software to easily check multiple random pages for
  accessibility without experiencing a trap. The unit-stride versions
  only allow probing a region immediately contiguous to a known region,
  and so do not appreciably impact security.  It is possible that
  security mitigations can be implemented to allow fault-only-first
  variants of non-contiguous accesses in future vector extensions.


.. note::

  セキュリティ上の懸念があります。 ストライドおよびスキャッター/ギャザーのfault-on-firstロード命令は、エンコーディング空間がないため提供されていません。
  また、より大きなセキュリティ全体となる可能性があり、ソフトウェアは例外を経験することなく、
  複数のランダムなページのアクセス性を簡単にチェックすることができます。
  ユニットストライド版では、既知の領域とすぐに隣接する領域のプロービングしかできないため、セキュリティに大きな影響はありません。
  将来のベクトル拡張では、非連続アクセスの fault-only-first を可能にするセキュリティ緩和策が実装される可能性があります。
  
..
  Even when an exception is not raised, implementations are permitted to process
  fewer than `vl` elements and reduce `vl` accordingly, but if `vstart`=0 and
  `vl`>0, then at least one element must be processed.


例外が発生しない場合でも、実装では `vl` より少ない要素を処理し、それに応じて `vl` を減らすことが許可されていますが、
`vstart`=0 かつ `vl`>0 の場合は、少なくとも 1 つの要素を処理する必要があります。

..
  When the fault-only-first instruction takes a trap due to an
  interrupt, implementations should not reduce `vl` and should instead
  set a `vstart` value.


fault-on-firstロード命令が割り込みによる例外を発生する場合、
実装は `vl` を削減せず、代わりに `vstart` の値を設定する必要があります。

..
  NOTE: When the fault-only-first instruction would trigger a debug
  data-watchpoint trap on an element after the first, implementations
  should not reduce `vl` but instead should trigger the debug trap as
  otherwise the event might be lost.


.. note::

  実装は `vl` を減らすのではなく、イベントが失われる可能性があるため、
  デバッグ・例外をトリガーするべきです。
  
.. _sec-aos:


****************************************************
ベクトルロードストアセグメント命令
****************************************************

..
  This instruction subset is given the ISA string name `Zvlsseg`.


この命令サブセットには、ISAの文字列名 `Zvlsseg` が与えられています。

..
  The vector load/store segment instructions move multiple contiguous
  fields in memory to and from consecutively numbered vector registers.


ベクトルロード/ストアセグメント命令は、メモリ上の複数の連続したフィールドを、
連続した番号のベクトルレジスタとの間で移動します。

..
  NOTE: These operations support operations on "array-of-structures"
  datatypes by unpacking each field in a structure into separate vector
  registers.


.. note::

  "構造体の配列"データタイプの操作をサポートします。
  
..
  The three-bit `nf` field in the vector instruction encoding is an
  unsigned integer that contains one less than the number of fields per
  segment, *NFIELDS*.


ベクトル命令の符号化における3ビットの `nf` フィールドは、セグメントごとのフィールド数 *NFIELDS* よりも1つ少ない符号なし整数です。

..


..
  The EMUL setting must be such that EMUL * NFIELDS {le} 8, otherwise
  the instruction encoding is reserved.


EMULの設定は、EMUL * NFIELDS {le} 8となるようにしなければならず、
そうでない場合は、命令のエンコーディングは予約されています。

..
  NOTE: The product EMUL * NFIELDS represents the number of underlying
  vector registers that will be touched by a segmented load or store
  instruction.  This constraint makes this total no larger than 1/4 of
  the architectural register file, and the same as for regular
  operations with EMUL=8.


EMUL * NFIELDSは、セグメント化されたロード/ストア命令でタッチされる基本的なベクトルレジスタの数を表しています。
この制約により、この合計はアーキテクチャレジスタファイルの1/4以下となり、EMUL=8の通常の演算と同じになります。

..
  Each field will be held in successively numbered vector register
  groups.  When EMUL>1, each field will occupy a vector register group
  held in multiple successively numbered vector registers, and the
  vector register group for each field must follow the usual vector
  register alignment constraints (e.g., when EMUL=2 and NFIELDS=4, each
  field's vector register group must start at an even vector register,
  but does not have to start at a multiple of 8 vector register number).


各フィールドは、連続した番号のベクトルレジスタグループに保持されます。
EMUL>1の場合、各フィールドは複数の連続した番号のベクトルレジスタで保持されるベクトルレジスタグループを占有し、
各フィールドのベクトルレジスタグループは通常のベクトルレジスタアライメント制約に従わなければなりません
(例えば、EMUL=2、NFIELDS=4の場合、各フィールドのベクトルレジスタグループは偶数のベクトルレジスタで開始しなければなりませんが、
8の倍数のベクトルレジスタ番号で開始する必要はありません)。

..
  If the vector register numbers accessed by the segment load or store
  would increment past 31, then the instruction encoding is reserved.


セグメント・ロードまたはストアによってアクセスされるベクトルレジスタ番号が31を超えてインクリメントされる場合、
その命令エンコーディングは予約されています。

..
  NOTE: This constraint is to help allow for forward-compatibility with
  a possible future longer instruction encoding that has more
  addressable vector registers.


.. note::

  将来のより長い命令エンコーディングとの前方互換性を確保するためのものです。
  
..
  The `vl` register gives the number of structures to move, which is
  equal to the number of elements transferred to each vector register
  group.  Masking is also applied at the level of whole structures.


`vl` レジスタは、移動する構造体の数を示しており、
これは各ベクトルレジスタグループに転送される要素の数と同じです。
マスキングは構造体全体のレベルでも適用されます。

..
  For segment loads and stores, the individual memory accesses used to
  access fields within each segment are unordered with respect to each
  other even for ordered indexed segment loads and stores.


セグメントロードおよびストアでは、各セグメント内のフィールドにアクセスするために使用される個々のメモリアクセスは、
順序付きのインデックス付きのセグメントロードおよびストアであっても、お互いに順序付けられていません。

..
  If a trap is taken, `vstart` is in units of structures.
  If a trap occurs partway through accessing a structure, it is
  implementation-defined whether a subset of the structure access is performed.


例外が発生した場合、`vstart` は構造体の単位になります。
構造体へのアクセスの途中で例外が発生した場合、
構造体へのアクセスのサブセットを実行するかどうかは、実装で定義されています。


===============================================================================
ベクトルユニットストライドセグメントロードストア命令
===============================================================================

..
  The vector unit-stride load and store segment instructions move packed
  contiguous segments ("array-of-structures") into multiple destination
  vector register groups.


ベクトルユニットストライドのセグメントロードおよびセグメントストア命令は、
パックされた連続したセグメント("array-of-structures")を
複数の書き込みベクトルレジスタグループに移動させます。

..
  NOTE: For structures with heterogeneous-sized fields, software can
  later unpack structure fields from a segment using additional
  instructions after the segment load brings data into the vector
  registers.


.. note::

  ソフトウェアは追加の命令を使ってセグメントから構造体フィールドをアンパックすることができます。
  
..
  The assembler prefixes `vlseg`/`vsseg` are used for unit-stride
  segment loads and stores respectively.


ユニットストライドのセグメントロードとストアには、
それぞれ `vlseg`/`vsseg` というアセンブラのプレフィックスが使われます。

..
  ----
      # Format
      vlseg<nf>e<eew>.v vd, (rs1), vm       # Unit-stride segment load template
      vsseg<nf>e<eew>.v vs3, (rs1), vm       # Unit-stride segment store template
  
      # Examples
      vlseg8e8.v vd, (rs1), vm   # Load eight vector registers with eight byte fields.
  
      vsseg3e32.v vs3, (rs1), vm  # Store packed vector of 3*4-byte segments from vs3,vs3+1,vs3+2 to memory
  ----


::

      # フォーマット
      vlseg<nf>e<eew>.v vd, (rs1), vm       # ユニットストライドセグメントロードのテンプレート
      vsseg<nf>e<eew>.v vs3, (rs1), vm       # ユニットストライドセグメントストア命令のテンプレート
  
      # Examples
      vlseg8e8.v vd, (rs1), vm   # 1バイトフィールドを8つ持つ要素を8つのベクトルレジスタにロードする
  
      vsseg3e32.v vs3, (rs1), vm  # 3*4バイトセグメントの要素をvs3,vs3+1,vs3+2からメモリにストアする
  

..
  For loads, the `vd` register will hold the first field loaded from the
  segment.  For stores, the `vs3` register is read to provide the first
  field to be stored in each segment.


ロードの場合、`vd` レジスタはセグメントからロードされる最初のフィールドを保持します。
ストアの場合は、 `vs3` レジスタが読み込まれ、各セグメントに格納される最初のフィールドが提供されます。

..
  ----
      # Example 1
      # Memory structure holds packed RGB pixels (24-bit data structure, 8bpp)
      vsetvli a1, t0, e8, ta, ma
      vlseg3e8.v v8, (a0), vm
      # v8 holds the red pixels
      # v9 holds the green pixels
      # v10 holds the blue pixels
  
      # Example 2
      # Memory structure holds complex values, 32b for real and 32b for imaginary
      vsetvli a1, t0, e32, ta, ma
      vlseg2e32.v v8, (a0), vm
      # v8 holds real
      # v9 holds imaginary
  ----


::

      # 例1
      # パッキングされたRGBピクセル(8bppの24ビットデータ構造)
      vsetvli a1, t0, e8, ta, ma
      vlseg3e8.v v8, (a0), vm
      # v8 は赤ピクセルを持っている
      # v9 は緑ピクセルを持っている
      # v10 青ピクセルを持っている
  
      # 例2
      # メモリ構造は複素数を持っている。32ビットの実数と32ビットの虚数を持っている
      vsetvli a1, t0, e32, ta, ma
      vlseg2e32.v v8, (a0), vm
      # v8 は実数を持っている
      # v9 虚数を持っている
  

..
  There are also fault-only-first versions of the unit-stride instructions.


ユニットストライド命令については、fault-only-first版も定義されている。

..
  ----
      # Template for vector fault-only-first unit-stride segment loads.
      vlseg<nf>e<eew>ff.v vd, (rs1),  vm          # Unit-stride fault-only-first segment loads
  ----


::

      # ベクトルfault-only-firstユニットストライドセグメントロード命令の例
      vlseg<nf>e<eew>ff.v vd, (rs1),  vm          # ユニットストライドfault-only-firstセグメントロード
  

..
  For fault-only-first segment loads, if an exception is detected partway
  through accessing a segment, regardless of whether the element index is zero,
  it is implementation-defined whether a subset of the segment is loaded.


fault-only-firstセグメントのロードでは、セグメントへのアクセスの途中で例外が検出された場合、
要素のインデックスがゼロであるかどうかにかかわらず、セグメントのサブセットをロードするかどうかは実装で定義されます。

..
  These instructions may overwrite destination vector register group
  elements past the point at which a trap is reported or past the point
  at which vector length is trimmed.


これらの命令は、例外が報告された時点や、ベクトル長がトリムされた時点を過ぎても、
書き込みベクトルレジスタグループの要素を上書きすることがあります。


===================================================================
ベクトルストライドセグメントロードストア命令
===================================================================

..
  Vector strided segment loads and stores move contiguous segments where
  each segment is separated by the byte-stride offset given in the `rs2`
  GPR argument.


ベクトルのストライドセグメントは、GPRの引数 `rs2` で指定されたバイトストライドの
オフセットで区切られた連続したセグメントを移動してロード、ストアします。

..
  NOTE: Negative and zero strides are supported.

.. note::

  
..
  ----
      # Format
      vlsseg<nf>e<eew>.v vd, (rs1), rs2, vm          # Strided segment loads
      vssseg<nf>e<eew>.v vs3, (rs1), rs2, vm         # Strided segment stores
  
      # Examples
      vsetvli a1, t0, e8, ta, ma
      vlsseg3e8.v v4, (x5), x6   # Load bytes at addresses x5+i*x6   into v4[i],
                                #  and bytes at addresses x5+i*x6+1 into v5[i],
                                #  and bytes at addresses x5+i*x6+2 into v6[i].
  
      # Examples
      vsetvli a1, t0, e32, ta, ma
      vssseg2e32.v v2, (x5), x6   # Store words from v2[i] to address x5+i*x6
                                  #   and words from v3[i] to address x5+i*x6+4
  ----


::

      # フォーマット
      vlsseg<nf>e<eew>.v vd, (rs1), rs2, vm          # ストライドセグメントロード
      vssseg<nf>e<eew>.v vs3, (rs1), rs2, vm         # ストライドセグメントストア
  
      # 例
      vsetvli a1, t0, e8, ta, ma
      vlsseg3e8.v v4, (x5), x6   # アドレスx5+i*x6をv4[i]に格納する
                                #  x5+i*x6+1をv5[i]に格納する
                                #  x5+i*x6+2をv6[i]に格納する
  
      # 例
      vsetvli a1, t0, e32, ta, ma
      vssseg2e32.v v2, (x5), x6   # v2[i]のワードをアドレスx5+i*x6にストアする
                                  #   また、v3[i]のワードをx5+i*x6+4にストアする
  

..
  Accesses to the fields within each segment can occur in any order,
  including the case where the byte stride is such that segments overlap
  in memory.


各セグメント内のフィールドへのアクセスは、
セグメントがメモリ内で重なるようなバイトストライドの場合も含め、
どのような順序でも行うことができます。


======================================================================
ベクトルインデックスセグメントロードストア命令
======================================================================

..
  Vector indexed segment loads and stores move contiguous segments where
  each segment is located at an address given by adding the scalar base
  address in the `rs1` field to byte offsets in vector register `vs2`.
  Both ordered and unordered forms are provided, where the ordered forms
  access segments in element order.  However, even for the ordered form,
  accesses to the fields within an individual segment are not ordered
  with respect to each other.


各セグメントは、`rs1` フィールドのスカラのベースアドレスと、ベクトルレジスタ `vs2` のバイトオフセットを足したアドレスに位置します。
Index-Ordered形式とIndex-Unordered形式の両方が提供されており、
Index-Ordered形式はセグメントを要素順にアクセスします。
しかし、順序付けられた形式でも、個々のセグメント内のフィールドへのアクセスは、お互いに順序付けられていません。

..
  The data vector register group has EEW=SEW, EMUL=LMUL, while the index
  vector register group has EEW encoded in the instruction with
  EMUL=(EEW/SEW)*LMUL.


データベクトルレジスタ群はEEW=SEW、EMUL=LMUL、
インデックスベクトルレジスタ群はEEWがEMUL=(EEW/SEW)*LMULで命令にエンコードされています。

..
  ----
      # Format
      vluxseg<nf>ei<eew>.v vd, (rs1), vs2, vm   # Indexed-unordered segment loads
      vloxseg<nf>ei<eew>.v vd, (rs1), vs2, vm   # Indexed-ordered segment loads
      vsuxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm  # Indexed-unordered segment stores
      vsoxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm  # Indexed-ordered segment stores
  
      # Examples
      vsetvli a1, t0, e8, ta, ma
      vluxseg3ei32.v v4, (x5), v3   # Load bytes at addresses x5+v3[i]   into v4[i],
                                #  and bytes at addresses x5+v3[i]+1 into v5[i],
                                #  and bytes at addresses x5+v3[i]+2 into v6[i].
  
      # Examples
      vsetvli a1, t0, e32, ta, ma
      vsuxseg2ei32.v v2, (x5), v5   # Store words from v2[i] to address x5+v5[i]
                                #   and words from v3[i] to address x5+v5[i]+4
  ----


::

      # 例
      vluxseg<nf>ei<eew>.v vd, (rs1), vs2, vm   # Indexed-unorderedセグメントロード
      vloxseg<nf>ei<eew>.v vd, (rs1), vs2, vm   # Indexed-orderedセグメントロード
      vsuxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm  # Indexed-unorderedセグメントストア
      vsoxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm  # Indexed-orderedセグメントストア
  
      # 例
      vsetvli a1, t0, e8, ta, ma
      vluxseg3ei32.v v4, (x5), v3   # アドレスx5+v3[i]のバイトデータをv4[i]にロードする
                                #  アドレスx5+v3[i]+1のバイトデータをv5[i]にロードする
                                #  アドレスx5+v3[i]+2のバイトデータをv6[i]にロードする
  
      # 例
      vsetvli a1, t0, e32, ta, ma
      vsuxseg2ei32.v v2, (x5), v5   # v2[i]中のワードをアドレスx5+v5[i]にストアする
                                #   v3[i]中のワードをアドレスx5+v5[i]+4にストアする
  

..
  For vector indexed segment loads, the destination vector register
  groups cannot overlap the source vector register group (specified by
  `vs2`), else the instruction encoding is reserved.


ベクトルインデックスセグメントロードでは、
書き込みベクトルレジスタグループはソースベクトルレジスタグループ(`vs2` で指定)と重なることはできません。

..
  NOTE: This constraint supports restart of indexed segment loads
  that raise exceptions partway through loading a structure.


.. note::

  
*******************************************************
ベクトル全体レジスタロードストア命令
*******************************************************

..
  Format for Vector Load Whole Register Instructions under LOAD-FP major opcode


LOAD-FPメジャーオペコードのベクトルロード全体レジスタ命令のフォーマット

..
  31 29  28  27 26  25 24   20 19       15 14   12 11      7 6     0
   nf  | mew|  00  | 1| 01000 |    rs1    | width |    vd   |0000111| VL<nf>R


```wavedrom
{reg: [
  {bits: 7, name: 0x07, attr: 'VL*R*'},
  {bits: 5, name: 'vd', attr: 'destination of load', type: 2},
  {bits: 3, name: 'width'},
  {bits: 5, name: 'rs1', attr: 'base address', type: 4},
  {bits: 5, name: 8, attr: 'lumop'},
  {bits: 1, name: 1, attr: 'vm'},
  {bits: 2, name: 0x10000, attr: 'mop'},
  {bits: 1, name: 'mew'},
  {bits: 3, name: 'nf'},
]}
```

..
  Format for Vector Store Whole Register Instructions under STORE-FP major opcode
  31 29  28  27 26  25  24  20 19       15 14   12 11      7 6     0
   nf  |  0 |  00  | 1| 01000 |    rs1    |  000  |   vs3   |0100111| VS<nf>R


```wavedrom
{reg: [
  {bits: 7, name: 0x27, attr: 'VS*R*'},
  {bits: 5, name: 'vs3', attr: 'store data', type: 2},
  {bits: 3, name: 0x1000},
  {bits: 5, name: 'rs1', attr: 'base address', type: 4},
  {bits: 5, name: 8, attr: 'sumop'},
  {bits: 1, name: 1, attr: 'vm'},
  {bits: 2, name: 0x100, attr: 'mop'},
  {bits: 1, name: 0x100, attr: 'mew'},
  {bits: 3, name: 'nf'},
]}
```

..
  These instructions load and store whole vector register groups.


これらの命令は、ベクトルレジスタグループ全体をロードおよびストアします。

..
  NOTE: These instructions are intended to be used to save and restore
  vector registers when the type or length of the current contents of
  the vector register is not known, or where modifying `vl` and `vtype`
  would be costly. Examples include compiler register spills, vector
  function calls where values are passed in vector registers, interrupt
  handlers, and OS context switches.  Software can determine the number
  of bytes transferred by reading the `vlenb` register.


.. note::

  `vl` や `vtype` を変更するとコストがかかる場合に、ベクトル・レジスタの保存や復元に使用することを目的としています。
  例えば、コンパイラのレジスタ・スピル、ベクトル・レジスタで値が渡されるベクトル関数呼び出し、割り込みハンドラ、OSのコンテキスト・スイッチなどです。
  ソフトウェアは、 `vlenb` レジスタを読むことで、転送されたバイト数を知ることができます。
  
..
  The load instructions have an EEW encoded in the `mew` and `width`
  fields following the pattern of regular unit-stride loads.


ロード命令では、通常のユニットストライド・ロードのパターンに従って、
`mew` および `width` フィールドにEEWがエンコードされています。

..
  NOTE: Because in-register byte layouts are identical to in-memory byte
  layouts, the same data is written to the destination register group
  regardless of EEW.
  Hence, it would have sufficed to provide only EEW=8 variants.
  The full set of EEW variants is provided so that the encoded EEW can be used
  as a hint to indicate the destination register group will next be accessed
  with this EEW, which aids implementations that rearrange data internally.


.. note::

  EEWに関係なく同じデータが出力先のレジスタ群に書き込まれます。
  そのため、EEW=8 のバリエーションだけを用意すれば十分でした。
  EEWのバリエーションをすべて用意したのは、エンコードされたEEWをヒントにして、
  次にこのEEWでアクセスされる書き込みレジスタグループを示すことができるようにするためで、
  これは内部的にデータを再配置する実装の助けになります。
  
..
  The vector whole register store instructions are encoded similar to
  unmasked unit-stride store of elements with EEW=8.


ベクトル全体レジスタストア命令は、EEW=8 のマスクされていない
ユニットストライドストアと同様にエンコードされます。

..
  The `nf` field encodes how many vector registers to load and store.
  The encoded number of registers must be a power of 2 and the vector
  register numbers must be aligned as with a vector register group,
  otherwise the instruction encoding is reserved.  The `nf` field
  encodes the number of vector registers to transfer, numbered
  successively after the base.  Only `nf` values of 1, 2, 4, 8 are
  supported, with other values reserved.  When multiple registers are
  transferred, the lowest-numbered vector register is held in the
  lowest-numbered memory addresses and successive vector register
  numbers are placed contiguously in memory.


フィールド `nf` は、ロードおよびストアするベクトルレジスタの数をエンコードします。
符号化されたレジスタ数は2の累乗でなければならず、
ベクトルレジスタ番号はベクトルレジスタグループと同様にアラインメントされていなければならず、
そうでなければ命令の符号化は予約されます。
`nf` フィールドは、転送するベクトルレジスタの数をエンコードするもので、
ベースの後に連続して番号が付けられます。
`nf` の値は1、2、4、8のみがサポートされており、その他の値は予約されています。
複数のレジスタを転送する場合、最も低い番号のベクトルレジスタは最も低い番号のメモリアドレスに保持され、
連続したベクトルレジスタ番号はメモリに連続して配置されます。

..
  The instructions operate with an effective vector length,
  `evl`=`nf`*VLEN/EEW, regardless of current settings in `vtype` and
  `vl`.  The usual property that no elements are written if `vstart`
  {ge} `vl` does not apply to these instructions.  Instead, no elements
  are written if `vstart` {ge} `evl`.


この命令は、 `vtype` や `vl` の現在の設定にかかわらず、 `evl`=`nf` *VLEN/EEWという実効ベクトル長で動作します。
`vstart` {ge} `vl` の場合には要素が書き込まれないという通常の特性は、 `vtype` や `vl` の現在の設定に関係なく動作します。
`vstart` {ge} `vl` の場合には要素が書き込まれないという通常の特性は、
これらの命令には適用されません。
代わりに、`vstart` {ge} `evl` の場合には要素は書き込まれません。

..
  The instructions operate similarly to unmasked unit-stride load and
  store instructions of elements, with the base address passed in the
  scalar `x` register specified by `rs1`.


この命令は、要素のマスクされていないユニットストライドロードおよびストア命令と同様に動作し、
ベースアドレスは `rs1` で指定されるスカラ `x` レジスタで渡されます。

..
  Implementations are allowed to raise a misaligned address exception on
  whole register loads and stores if the base address is not naturally
  aligned to the larger of the size of the encoded EEW in bytes (EEW/8)
  or the implementation's smallest supported SEW size in bytes
  (SEW~MIN~/8).


実装では、ベースアドレスが、エンコードされたEEWのサイズ(バイト)(EEW/8)と実装でサポートされている
SEWの最小サイズ(バイト)(SEW~MIN~/8)のいずれか大きい方に自然にアラインされていない場合、
レジスタ全体のロードとストアでアドレスのずれの例外を発生させることができます。

..
  NOTE: Allowing misaligned exceptions to be raised based on
  non-alignment to encoded EEW simplifies the implementation of these
  instructions.  Some subset implementations might not support smaller
  SEW widths, so are allowed to report misaligned exceptions for the
  smallest supported SEW even if larger than encoded EEW.  An extreme
  implementation might have SEW~MIN~>XLEN for example.  Software
  environments can mandate the minimum alignment requirements to support
  an ABI.


.. note::

  これらの命令の実装が簡素化されます。
  サブセットの実装によっては、より小さなSEW幅をサポートしていない場合があるため、
  エンコードされたEEWよりも大きくても、サポートされている最小のSEWに対してミスアラインド例外を報告することが許可されています。
  極端な実装では、例えば SEW~MIN~>XLEN のようになります。
  ソフトウェア環境は、ABIをサポートするための最小アラインメント要件を義務付けることができます。
  
..
  ----
     # Format of whole register load and store instructions.
     vl1r.v v3, (a0)       # Pseudoinstruction equal to vl1re8.v
  
     vl1re8.v    v3, (a0)  # Load v3 with VLEN/8 bytes held at address in a0
     vl1re16.v   v3, (a0)  # Load v3 with VLEN/16 halfwords held at address in a0
     vl1re32.v   v3, (a0)  # Load v3 with VLEN/32 words held at address in a0
     vl1re64.v   v3, (a0)  # Load v3 with VLEN/64 doublewords held at address in a0
     # vl1re128.v  v3, (a0)
     # vl1re256.v  v3, (a0)
     # vl1re512.v  v3, (a0)
     # vl1re1024.v v3, (a0)
  
     vl2r.v v2, (a0)       # Pseudoinstruction equal to vl2re8.v v2, (a0)
  
     vl2re8.v    v2, (a0)  # Load v2-v3 with 2*VLEN/8 bytes from address in a0
     vl2re16.v   v2, (a0)  # Load v2-v3 with 2*VLEN/16 halfwords held at address in a0
     vl2re32.v   v2, (a0)  # Load v2-v3 with 2*VLEN/32 words held at address in a0
     vl2re64.v   v2, (a0)  # Load v2-v3 with 2*VLEN/64 doublewords held at address in a0
     # vl2re128.v  v2, (a0)
     # vl2re256.v  v2, (a0)
     # vl2re512.v  v2, (a0)
     # vl2re1024.v v2, (a0)
  
     vl4r.v v4, (a0)       # Pseudoinstruction equal to vl4re8.v
  
     vl4re8.v    v4, (a0)  # Load v4-v7 with 4*VLEN/8 bytes from address in a0
     vl4re16.v   v4, (a0)
     vl4re32.v   v4, (a0)
     vl4re64.v   v4, (a0)
     # vl4re128.v  v4, (a0)
     # vl4re256.v  v4, (a0)
     # vl4re512.v  v4, (a0)
     # vl4re1024.v v4, (a0)
  
     vl8r.v v8, (a0)       # Pseudoinstruction equal to vl8re8.v
  
     vl8re8.v    v8, (a0)  # Load v8-v15 with 8*VLEN/8 bytes from address in a0
     vl8re16.v   v8, (a0)
     vl8re32.v   v8, (a0)
     vl8re64.v   v8, (a0)
     # vl8re128.v  v8, (a0)
     # vl8re256.v  v8, (a0)
     # vl8re512.v  v8, (a0)
     # vl8re1024.v v8, (a0)
  
     vs1r.v v3, (a1)      # Store v3 to address in a1
     vs2r.v v2, (a1)      # Store v2-v3 to address in a1
     vs4r.v v4, (a1)      # Store v4-v7 to address in a1
     vs8r.v v8, (a1)      # Store v8-v15 to address in a1
  ----


::

     # 全体レジスタロードストア命令のフォーマット
     vl1r.v v3, (a0)       # vl1re8.vの疑似命令
  
     vl1re8.v    v3, (a0)  # a0に保持されているアドレスからVLEN/8バイトだけロードしv3に格納する
     vl1re16.v   v3, (a0)  # a0に保持されているアドレスからVLEN/16バイトだけロードしv3に格納する
     vl1re32.v   v3, (a0)  # a0に保持されているアドレスからVLEN/32バイトだけロードしv3に格納する
     vl1re64.v   v3, (a0)  # a0に保持されているアドレスからVLEN/64バイトだけロードしv3に格納する
     # vl1re128.v  v3, (a0)
     # vl1re256.v  v3, (a0)
     # vl1re512.v  v3, (a0)
     # vl1re1024.v v3, (a0)
  
     vl2r.v v2, (a0)       # vl2re8.v v2, (a0)の疑似命令
  
     vl2re8.v    v2, (a0)  # a0に保持されているアドレスから2*VLEN/8バイトだけロードしv2-v3に格納する
     vl2re16.v   v2, (a0)  # a0に保持されているアドレスから2*VLEN/16バイトだけロードしv2-v3に格納する
     vl2re32.v   v2, (a0)  # a0に保持されているアドレスから2*VLEN/32バイトだけロードしv2-v3に格納する
     vl2re64.v   v2, (a0)  # a0に保持されているアドレスから2*VLEN/64バイトだけロードしv2-v3に格納する
     # vl2re128.v  v2, (a0)
     # vl2re256.v  v2, (a0)
     # vl2re512.v  v2, (a0)
     # vl2re1024.v v2, (a0)
  
     vl4r.v v4, (a0)       # vl4re8.vの疑似命令
  
     vl4re8.v    v4, (a0)  # a0に保持されているアドレスから4*VLEN/8バイトだけロードしv4-v7に格納する
     vl4re16.v   v4, (a0)
     vl4re32.v   v4, (a0)
     vl4re64.v   v4, (a0)
     # vl4re128.v  v4, (a0)
     # vl4re256.v  v4, (a0)
     # vl4re512.v  v4, (a0)
     # vl4re1024.v v4, (a0)
  
     vl8r.v v8, (a0)       # vl8re8.vの疑似命令
  
     vl8re8.v    v8, (a0)  # a0に保持されているアドレスから8*VLEN/8バイトだけロードしv8-v15に格納する
     vl8re16.v   v8, (a0)
     vl8re32.v   v8, (a0)
     vl8re64.v   v8, (a0)
     # vl8re128.v  v8, (a0)
     # vl8re256.v  v8, (a0)
     # vl8re512.v  v8, (a0)
     # vl8re1024.v v8, (a0)
  
     vs1r.v v3, (a1)      # v3をa1に保持されているアドレスにストアする
     vs2r.v v2, (a1)      # v2-v3をa1に保持されているアドレスにストアする
     vs4r.v v4, (a1)      # v4-v7をa1に保持されているアドレスにストアする
     vs8r.v v8, (a1)      # v8-v15をa1に保持されているアドレスにストアする
  

..
  NOTE: Implementations should raise illegal instruction exceptions on
  `vl<nf>r` instructions for EEW values that are not supported.


.. note::

  実装上、不正命令例外を発生させる必要があります。
  
..
  NOTE: We have considered adding a whole register mask load
  instruction (``vl1re1.v vd, (rs1)``) as a mask hint but this is not currently on
  PoR.


.. note::

  これは現在PoRにはありません。
  

#################################################
ベクトルメモリアラインメント制約
#################################################

..
  If an element accessed by a vector memory instruction is not naturally
  aligned to the size of the element, either the element is transferred
  successfully or an address misaligned exception is raised on that
  element.


ベクトルメモリ命令でアクセスした要素のサイズが自然にアラインされていない場合、
その要素は正常に転送されるか、その要素に対してアドレスミスアラインの例外が発生します。

..
  Support for misaligned vector memory accesses is independent of an
  implementation's support for misaligned scalar memory accesses.


ベクトルメモリアクセスのミスアラインメントのサポートは、
スカラメモリアクセスのミスアラインメントのサポートとは独立しています。

..
  NOTE: An implementation may have neither, one, or both scalar and
  vector memory accesses support some or all misaligned accesses in
  hardware.  A separate PMA should be defined to determine if vector
  misaligned accesses are supported in the associated address range.


.. note::

  関連するアドレス範囲でベクトルのミスアラインドアクセスがサポートされているかどうかを判断するために、別のPMAを定義する必要があります。
  
..
  Vector misaligned memory accesses follow the same rules for atomicity
  as scalar misaligned memory accesses.


ベクトルのミスアラインドメモリアクセスは、
スカラのミスアラインドメモリアクセスと同じアトミック性のルールに従います。


####################################################
ベクトルメモリコンシステンシモデル
####################################################

..
  Vector memory instructions appear to execute in program order on the
  local hart.


ベクトルメモリアクセス命令は、ローカルのhartではプログラム順に実行されているように見えます。

..
  Vector memory instructions follow RVWMO at the instruction level.


ベクトルメモリアクセス命令は、命令レベルではRVWMOに従います。

..
  Except for vector indexed-ordered loads and stores, element operations
  are unordered within the instruction.


Index-Ordered形式以外のロード・ストア命令を除き、要素操作は命令内では順不同です。

..
  Vector indexed-ordered loads and stores read and write elements
  from/to memory in element order respectively.


Index-Ordered形式ロード・ストア命令は、
要素をメモリに読み書きする際に、それぞれ要素順に行います。

..
  NOTE: More formal definitions required.


.. note::

  
..
  Instructions affected by the vector length register `vl` have a control
  dependency on `vl`, rather than a data dependency.
  Similarly, masked vector instructions have a control dependency on the source
  mask register, rather than a data dependency.


ベクトル長レジスタ `vl` の影響を受ける命令は、データ依存ではなく、
`vl` への制御依存を持ちます。
同様に、マスクされたベクトル命令は、データ依存ではなく、ソースマスクレジスタへの制御依存を持ちます。

..
  NOTE: Treating the vector length and mask as control rather than data
  typically matches the semantics of the corresponding scalar code, where branch
  instructions ordinarily would have been used.
  Treating the mask as control allows masked vector load instructions to access
  memory before the mask value is known, without the need for
  a misspeculation-recovery mechanism.


.. note::

  ここでは通常、分岐命令が使用されていました。
  マスクを制御として扱うことで、マスクされたベクトルロード命令は、
  マスク値が判明する前にメモリにアクセスすることができ、
  分岐予測失敗時に回復するメカニズムを必要としません。
  

###############################
ベクトル算術演算命令
###############################

..
  The vector arithmetic instructions use a new major opcode (OP-V =
  1010111~2~) which neighbors OP-FP.  The three-bit `funct3` field is
  used to define sub-categories of vector instructions.


ベクトル演算命令では、OP-FPに隣接する新しいメジャーオペコード(OP-V = 1010111~2~)を使用します。
3ビットの `funct3` フィールドは、ベクトル命令のサブカテゴリを定義するのに使われます。

include::valu-format.adoc[]

.. _sec-arithmetic-encoding:


*******************************************************
ベクトル算術演算命令エンコーディング
*******************************************************

..
  The `funct3` field encodes the operand type and source locations.


`funct3` フィールによりオペランドのタイプとソース位置をエンコードします。

..


..
  Integer operations are performed using unsigned or two's-complement
  signed integer arithmetic depending on the opcode.


整数演算は、オペコードに応じて符号なしまたは2の補数の符号付き整数演算で行われます。

..
  NOTE: In this discussion, fixed-point operations are
  considered to be integer operations.


.. note::

  
..
  All standard vector floating-point arithmetic operations follow the
  IEEE-754/2008 standard.  All vector floating-point operations use the
  dynamic rounding mode in the `frm` register.  Use of the `frm` field
  when it contains an invalid rounding mode by any vector floating-point
  instruction, even those that do not depend on the rounding mode, or
  when `vl`=0, or when `vstart` {ge} `vl`, is reserved.


すべての標準ベクトル浮動小数点演算は、IEEE-754/2008 規格に準拠しています。
すべてのベクトル浮動小数点演算は、`frm` レジスタのダイナミック丸めモードを使用します。
丸めモードに依存しない命令であっても、ベクトル浮動小数点演算命令で無効な丸めモードが含まれている場合や、
`vl`=0 の場合、または `vstart` {ge} `vl` の場合の `frm` フィールドの使用は予約されています。

..
  NOTE: All vector floating-point code will rely on a valid value in
  `frm`.  Implementations can make all vector FP instructions report
  exceptions when the rounding mode is invalid to simplify control
  logic.


.. note::

  実装では、制御ロジックを単純化するために、丸めモードが無効なときにすべてのベクトルFP命令が例外を報告するようにすることができます。
  
..
  Vector-vector operations take two vectors of operands from vector
  register groups specified by `vs2` and `vs1` respectively.


ベクトル演算は、`vs2` と `vs1` でそれぞれ指定されたベクトルレジスタグループから、
オペランドの2つのベクトルを取ります。

..
  Vector-scalar operations can have three possible forms, but in all
  cases take one vector of operands from a vector register group
  specified by `vs2` and a second scalar source operand from one of
  three alternative sources.


ベクトル・スカラ演算には3つの形式がありますが、いずれの場合も、
`vs2` で指定されたベクトルレジスタ群から1つのベクトルのオペランドを取り、
3つの代替ソースのうちの1つから2つ目のスカラソースのオペランドを取ります。

..
  * For integer operations, the scalar can be a 5-bit immediate encoded
  in the `rs1` field.  The value is sign-extended to SEW bits, unless
  otherwise specified.  . For integer operations, the scalar can be
  taken from the scalar `x` register specified by `rs1`.  If XLEN>SEW,
  the least-significant SEW bits of the `x` register are used, unless
  otherwise specified.  If XLEN<SEW, the value from the `x` register is
  sign-extended to SEW bits.  For floating-point operations, the
  scalar can be taken from a scalar `f` register.  If FLEN > SEW, the
  value in the `f` registers is checked for a valid NaN-boxed value, in
  which case the least-significant SEW bits of the `f` register are
  used, else the canonical NaN value is used.  Vector instructions where
  any floating-point vector operand's EEW is not a supported
  floating-point type width (which includes when FLEN < SEW) are
  reserved.


* 整数演算の場合、スカラは `rs1` フィールドでエンコードされた5ビットの即値になります。
この値は、特に指定がない限り、SEWビットに符号拡張されます。
* 整数演算では、スカラは `rs1` で指定されたスカラ `x` レジスタから取ることができます。
XLEN>SEWの場合、特に指定がない限り、`x` レジスタの最下位SEWビットが使用されます。
XLEN<SEWの場合、`x` レジスタからの値はSEWビットに符号拡張されます。
浮動小数点演算の場合、スカラはスカラ `f` レジスタから取得できます。
FLEN > SEWの場合、 `f` レジスタの値が有効なNaNボックス値であるかどうかがチェックされ、
その場合は `f` レジスタの最下位SEWビットが使用され、そうでない場合は正規のNaN値が使用されます。
浮動小数点ベクトルオペランドのEEWがサポートされていない浮動小数点型の幅であるベクトル命令(FLEN < SEWの場合を含む)は予約されています。

..
  NOTE: Some instructions *zero*-extend the 5-bit immediate, and denote this
  by naming the immediate `uimm` in the assembly syntax.


.. note::

  
..
  NOTE: The proposed Zfinx variants will take the floating-point scalar
  argument from the `x` registers.


.. note::

  
..
  Vector arithmetic instructions are masked under control of the `vm`
  field.


ベクトル演算命令は、`vm` フィールドの制御下でマスクされます。

..
  ----
  # Assembly syntax pattern for vector binary arithmetic instructions
  
  # Operations returning vector results, masked by vm (v0.t, <nothing>)
  vop.vv  vd, vs2, vs1, vm  # integer vector-vector      vd[i] = vs2[i] op vs1[i]
  vop.vx  vd, vs2, rs1, vm  # integer vector-scalar      vd[i] = vs2[i] op x[rs1]
  vop.vi  vd, vs2, imm, vm  # integer vector-immediate   vd[i] = vs2[i] op imm
  
  vfop.vv  vd, vs2, vs1, vm # FP vector-vector operation vd[i] = vs2[i] fop vs1[i]
  vfop.vf  vd, vs2, rs1, vm # FP vector-scalar operation vd[i] = vs2[i] fop f[rs1]
  ----


::

  # ベクトル2項算術演算命令のアセンブリ構文パタン
  
  # ベクトル演算の結果はvm(v0.t, <指定なし>)によりマスクされる
  vop.vv  vd, vs2, vs1, vm  # 整数 ベクトル-ベクトル vd[i] = vs2[i] op vs1[i]
  vop.vx  vd, vs2, rs1, vm  # 整数 ベクトル-スカラ   vd[i] = vs2[i] op x[rs1]
  vop.vi  vd, vs2, imm, vm  # 整数 ベクトル-即値     vd[i] = vs2[i] op imm
  
  vfop.vv  vd, vs2, vs1, vm # 浮動小数点 ベクトル-ベクトル 演算 vd[i] = vs2[i] fop vs1[i]
  vfop.vf  vd, vs2, rs1, vm # 浮動小数点 ベクトル-スカラ 演算   vd[i] = vs2[i] fop f[rs1]
  

..
  NOTE: In the encoding, `vs2` is the first operand, while `rs1/imm`
  is the second operand. This is the opposite to the standard scalar
  ordering.  This arrangement retains the existing encoding conventions
  that instructions that read only one scalar register, read it from
  `rs1`, and that 5-bit immediates are sourced from the `rs1` field.


.. note::

  これは、標準的なスカラの順序とは逆です。
  この配列は、1つのスカラレジスタのみを読み出す命令は `rs1` から読み出すという既存のエンコーディング規約を維持し、
  5ビットの即値は `rs1` フィールドから供給されます。
  
..
  ----
  # Assembly syntax pattern for vector ternary arithmetic instructions (multiply-add)
  
  # Integer operations overwriting sum input
  vop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vs2[i] + vd[i]
  vop.vx vd, rs1, vs2, vm  # vd[i] = x[rs1] * vs2[i] + vd[i]
  
  # Integer operations overwriting product input
  vop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vd[i] + vs2[i]
  vop.vx vd, rs1, vs2, vm  # vd[i] = x[rs1] * vd[i] + vs2[i]
  
  # Floating-point operations overwriting sum input
  vfop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vs2[i] + vd[i]
  vfop.vf vd, rs1, vs2, vm  # vd[i] = f[rs1] * vs2[i] + vd[i]
  
  # Floating-point operations overwriting product input
  vfop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vd[i] + vs2[i]
  vfop.vf vd, rs1, vs2, vm  # vd[i] = f[rs1] * vd[i] + vs2[i]
  ----


::

  # ベクトル3項算術演算命令(multiply-add)のアセンブリ構文パタン
  
  # 加算入力値を上書きする整数演算
  vop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vs2[i] + vd[i]
  vop.vx vd, rs1, vs2, vm  # vd[i] = x[rs1] * vs2[i] + vd[i]
  
  # 乗算入力値を上書きする整数演算
  vop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vd[i] + vs2[i]
  vop.vx vd, rs1, vs2, vm  # vd[i] = x[rs1] * vd[i] + vs2[i]
  
  # 加算入力値を上書きする浮動小数点演算
  vfop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vs2[i] + vd[i]
  vfop.vf vd, rs1, vs2, vm  # vd[i] = f[rs1] * vs2[i] + vd[i]
  
  # 乗算入力値を上書きする浮動小数点演算
  vfop.vv vd, vs1, vs2, vm  # vd[i] = vs1[i] * vd[i] + vs2[i]
  vfop.vf vd, rs1, vs2, vm  # vd[i] = f[rs1] * vd[i] + vs2[i]
  

..
  NOTE: For ternary multiply-add operations, the assembler syntax always
  places the destination vector register first, followed by either `rs1`
  or `vs1`, then `vs2`.  This ordering provides a more natural reading
  of the assembler for these ternary operations, as the multiply
  operands are always next to each other.


.. note::

  次に `rs1` または `vs1` 、そして `vs2` の順になります。
  この順序では、乗算オペランドが常に隣り合っているため、
  これらの三項演算のアセンブラをより自然に読むことができます。
  
.. _sec-widening:


****************************************
幅拡張ベクトル算術演算命令
****************************************

..
  A few vector arithmetic instructions are defined to be **widening**
  operations where the destination vector register group has EEW=2*SEW and
  EMUL=2*LMUL.


いくつかのベクトル演算命令は、書き込みベクトルレジスタグループがEEW=2*SEWおよびEMUL=2*LMULを持つ**幅拡張** 演算が定義されています。

..
  The first vector register group operand can be either single or
  double-width. These are generally written with a `vw*` prefix on the
  opcode or `vfw*` for vector floating-point operations.


第1ベクトルレジスタグループのオペランドは、単一幅または二倍幅のいずれかです。
これらは通常、オペコードの前に `vw*` というプレフィックスを付けて記述され、
ベクトル浮動小数点演算の場合は `vfw*` となります。

..
  ----
  Assembly syntax pattern for vector widening arithmetic instructions
  
  # Double-width result, two single-width sources: 2*SEW = SEW op SEW
  vwop.vv  vd, vs2, vs1, vm  # integer vector-vector      vd[i] = vs2[i] op vs1[i]
  vwop.vx  vd, vs2, rs1, vm  # integer vector-scalar      vd[i] = vs2[i] op x[rs1]
  
  # Double-width result, first source double-width, second source single-width: 2*SEW = 2*SEW op SEW
  vwop.wv  vd, vs2, vs1, vm  # integer vector-vector      vd[i] = vs2[i] op vs1[i]
  vwop.wx  vd, vs2, rs1, vm  # integer vector-scalar      vd[i] = vs2[i] op x[rs1]
  ----


::

  ベクトル幅拡張算術演算命令のアセンブリ構文パタン
  
  # 2つの1倍幅ソースで2倍幅の結果: 2*SEW = SEW op SEW
  vwop.vv  vd, vs2, vs1, vm  # 整数 vector-vector      vd[i] = vs2[i] op vs1[i]
  vwop.vx  vd, vs2, rs1, vm  # 整数 vector-scalar      vd[i] = vs2[i] op x[rs1]
  
  # 1番目のソースオペランドが2倍幅, 2番目のソースオペランドが1倍幅, 結果が2倍幅: 2*SEW = SEW op SEW
  vwop.wv  vd, vs2, vs1, vm  # 整数 vector-vector      vd[i] = vs2[i] op vs1[i]
  vwop.wx  vd, vs2, rs1, vm  # 整数 vector-scalar      vd[i] = vs2[i] op x[rs1]
  

..
  NOTE: Originally, a `w` suffix was used on opcode, but this could be
  confused with the use of a `w` suffix to mean word-sized operations in
  doubleword integers, so the `w` was moved to prefix.


.. note::

  2倍幅整数のワードサイズの演算を意味する `w` という接尾語と混同される恐れがあったため、 `w` はプレフィックスに移されました。
  
..
  NOTE: The floating-point widening operations were changed to `vfw*`
  from `vwf*` to be more consistent with any scalar widening
  floating-point operations that will be written as `fw*`.


.. note::

  `vwf*` から `vfw*` に変更されました。
  
..
  NOTE: For integer multiply-add, another possible widening option
  increases the size of the accumulator to EEW=4*SEW (i.e., 4*SEW +=
  SEW*SEW).  These would be distinguished by a `vq*` prefix on the
  opcode, for quad-widening.  These are not included at this time, but
  are a possible addition in a future extension.


.. note::

  これらは4倍にするために、オペコードの前に `vq*` という接頭語を付けることで区別されます。
  これらは、現時点では含まれていませんが、将来の拡張で追加される可能性があります。
  
..
  For all widening instructions, the destination EEW and EMUL values
  must be a supported configuration, otherwise the instruction encoding
  is reserved.


すべての幅拡張命令において、書き込みのEEWおよびEMULの値はサポートされているコンフィギュレーションでなければならず、
そうでない場合は命令のエンコーディングは予約されています。

..
  The destination vector register group must be specified using a vector
  register number that is valid for the destination's EMUL, otherwise the
  instruction encoding is reserved.


書き込みベクトルレジスタグループは、書き込みのEMULに有効なベクトルレジスタ番号を使って指定しなければならず、
そうでない場合は命令エンコーディングは予約されます。

..
  NOTE: This constraint is necessary to support restart with non-zero
  `vstart`.


.. note::

  
..
  NOTE: For the `vw<op>.wv vd, vs2, vs1` format instructions, it is legal
  for vd to equal vs2.


.. note::

  
.. _sec-narrowing:


****************************************
ベクトル幅縮小算術演算命令
****************************************

..
  A few instructions are provided to convert double-width source vectors
  into single-width destination vectors.  These instructions convert a
  vector register group with EEW/EMUL=2*SEW/2*LMUL to a vector register
  group with the current SEW/LMUL setting.


2 倍幅のソースベクトルを 1 倍幅の書き込みベクトルに変換する命令がいくつか用意されています。
これらの命令は、EEW/EMUL=2*SEW/2*LMULのベクトルレジスタグループを、
現在のSEW/LMUL設定のベクトルレジスタグループに変換します。

..
  If EEW > ELEN or EMUL > 8, the instruction encoding is reserved.


EEW > ELENまたはEMUL > 8の場合、命令のエンコーディングは予約されています。

..
  NOTE: An alternative design decision would have been to treat SEW/LMUL
  as defining the size of the source vector register group.  The choice
  here is motivated by the belief the chosen approach will require fewer
  `vtype` changes.


.. note::

  ここでの選択は、選択されたアプローチがより少ない `vtype` の変更を必要とするという信念に基づいています。
  
..
  The source and destination vector register groups have to be specified
  with a vector register number that is legal for the source and
  destination EMUL values respectively, otherwise the instruction
  encoding is reserved.


ソースおよび書き込みベクトルレジスタグループは、ソースおよび書き込みのEMUL値に対してそれぞれ正当な
ベクトルレジスタ番号で指定されなければならず、そうでなければ命令エンコーディングは予約されます。

..
  Where there is a second source vector register group (specified by
  `vs1`), this has the same (narrower) width as the result (i.e.,
  EEW=SEW).


第2のソースベクトルレジスタグループ(`vs1` で指定)がある場合、
これは結果と同じ(より狭い)幅を持ちます(すなわち、EEW=SEW)。

..
  NOTE: It is safe to overwrite a second source vector register group
  with the same EEW and EMUL as the result.


.. note::

  
..
  A `vn*` prefix on the opcode is used to distinguish these instructions
  in the assembler, or a `vfn*` prefix for narrowing floating-point
  opcodes.  The double-width source vector register group is signified
  by a `w` in the source operand suffix (e.g., `vnsra.wv`)


アセンブラでこれらの命令を区別するためにオペコードに `vn*` というプレフィックスをつけたり、
浮動小数点オペコードを絞り込むために `vfn*` というプレフィックスをつけたりします。
2倍幅ソースベクトルレジスタグループは、
ソースオペランドのサフィックスに `w` を付けて表します(例: `vnsra.wv`)。

..
  NOTE: Comparison operations that set a mask register are also
  implicitly a narrowing operation.


.. note::

  
.. _sec-vector-integer:


#####################################
ベクトル整数算術演算命令
#####################################

..
  A set of vector integer arithmetic instructions is provided.


整数ベクトル算術演算命令が提供されています。


**********************************************
ベクトル単一幅整数加算減算命令
**********************************************

..
  Vector integer add and subtract are provided.  Reverse-subtract
  instructions are also provided for the vector-scalar forms.


ベクトル整数加減算命令が提供されています。
ベクトル・スカラ形式においては逆減算命令も提供されています。

..
  ----
  # Integer adds.
  vadd.vv vd, vs2, vs1, vm   # Vector-vector
  vadd.vx vd, vs2, rs1, vm   # vector-scalar
  vadd.vi vd, vs2, imm, vm   # vector-immediate
  
  # Integer subtract
  vsub.vv vd, vs2, vs1, vm   # Vector-vector
  vsub.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Integer reverse subtract
  vrsub.vx vd, vs2, rs1, vm   # vd[i] = x[rs1] - vs2[i]
  vrsub.vi vd, vs2, imm, vm   # vd[i] = imm - vs2[i]
  ----


::

  # 整数加算
  vadd.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vadd.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vadd.vi vd, vs2, imm, vm   # ベクトル-即値
  
  # 整数減算
  vsub.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vsub.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
  # 整数逆減算
  vrsub.vx vd, vs2, rs1, vm   # vd[i] = x[rs1] - vs2[i]
  vrsub.vi vd, vs2, imm, vm   # vd[i] = imm - vs2[i]
  

..
  NOTE: A vector of integer values can be negated using a
  reverse-subtract instruction with a scalar operand of `x0`. Can define
  assembly pseudoinstruction `vneg.v vd,vs` = `vrsub.vx vd,vs,x0`.


ベクトル内の整数値は、`x0` とのスカラ逆減算によって符号を反転することができます。
これは `vneg vd,vs` = `vrsub.vx vd,vs,x0` としてアセンブリ疑似命令を定義することができます。


*******************************************
ベクトル幅拡張整数加減算命令
*******************************************

..
  The widening add/subtract instructions are provided in both signed and
  unsigned variants, depending on whether the narrower source operands
  are first sign- or zero-extended before forming the double-width sum.


符号付きと符号なしの両方で、幅拡張加減算命令が定義されています。
これらは幅の小さいオペランドが最初に符号拡張かゼロ拡張され、
倍幅での加減算が行われます。

..
  ----
  # Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW
  vwaddu.vv  vd, vs2, vs1, vm  # vector-vector
  vwaddu.vx  vd, vs2, rs1, vm  # vector-scalar
  vwsubu.vv  vd, vs2, vs1, vm  # vector-vector
  vwsubu.vx  vd, vs2, rs1, vm  # vector-scalar
  
  # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW
  vwadd.vv  vd, vs2, vs1, vm  # vector-vector
  vwadd.vx  vd, vs2, rs1, vm  # vector-scalar
  vwsub.vv  vd, vs2, vs1, vm  # vector-vector
  vwsub.vx  vd, vs2, rs1, vm  # vector-scalar
  
  # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW
  vwaddu.wv  vd, vs2, vs1, vm  # vector-vector
  vwaddu.wx  vd, vs2, rs1, vm  # vector-scalar
  vwsubu.wv  vd, vs2, vs1, vm  # vector-vector
  vwsubu.wx  vd, vs2, rs1, vm  # vector-scalar
  
  # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW
  vwadd.wv  vd, vs2, vs1, vm  # vector-vector
  vwadd.wx  vd, vs2, rs1, vm  # vector-scalar
  vwsub.wv  vd, vs2, vs1, vm  # vector-vector
  vwsub.wx  vd, vs2, rs1, vm  # vector-scalar
  ----


::

  # 符号なし整数の幅拡張加減算命令, 2*SEW = SEW +/- SEW
  vwaddu.vv  vd, vs2, vs1, vm  # vector-vector
  vwaddu.vx  vd, vs2, rs1, vm  # vector-scalar
  vwsubu.vv  vd, vs2, vs1, vm  # vector-vector
  vwsubu.vx  vd, vs2, rs1, vm  # vector-scalar
  
  # 符号付き整数の幅拡張加減算命令, 2*SEW = SEW +/- SEW
  vwadd.vv  vd, vs2, vs1, vm  # vector-vector
  vwadd.vx  vd, vs2, rs1, vm  # vector-scalar
  vwsub.vv  vd, vs2, vs1, vm  # vector-vector
  vwsub.vx  vd, vs2, rs1, vm  # vector-scalar
  
  # 符号なし整数の幅拡張加減算命令, 2*SEW = 2*SEW +/- SEW
  vwaddu.wv  vd, vs2, vs1, vm  # vector-vector
  vwaddu.wx  vd, vs2, rs1, vm  # vector-scalar
  vwsubu.wv  vd, vs2, vs1, vm  # vector-vector
  vwsubu.wx  vd, vs2, rs1, vm  # vector-scalar
  
  # 符号付き整数の幅拡張加減算命令, 2*SEW = 2*SEW +/- SEW
  vwadd.wv  vd, vs2, vs1, vm  # vector-vector
  vwadd.wx  vd, vs2, rs1, vm  # vector-scalar
  vwsub.wv  vd, vs2, vs1, vm  # vector-vector
  vwsub.wx  vd, vs2, rs1, vm  # vector-scalar
  

..
  NOTE: An integer value can be doubled in width using the widening add
  instructions with a scalar operand of `x0`.  Can define assembly
  pseudoinstructions `vwcvt.x.x.v vd,vs,vm = vwadd.vx vd,vs,x0,vm` and
  `vwcvtu.x.x.v vd,vs,vm = vwaddu.vx vd,vs,x0,vm`.


.. note::

  `vwcvt.x.x.v vd,vs,vm = vwadd.vx vd,vs,x0,vm` と `vwcvtu.x.x.v vd,vs,vm = vwaddu.vx vd,vs,x0,vm` により
  疑似アセンブリ命令を定義することができます。
  

*************************
ベクトル整数拡張
*************************

..
  The vector integer extension instructions zero- or sign-extend a
  source vector integer operand with EEW less than SEW to fill SEW-sized
  elements in the destination.  The EEW of the source is 1/2, 1/4, or
  1/8 of SEW, while EMUL of the source is (EEW/SEW)*LMUL.  The
  destination has EEW equal to SEW and EMUL equal to LMUL.


ゼロ拡張と符号拡張のベクトル整数拡張命令は、SEWよりも小さいEEWの整数オペランドを
SEWの幅まで拡張して書き込みレジスタに転送します。
ソースのEEWはSEWの1/2, 1/4, 1/8であり、ソースのEMULは(EEW/SEW)*LMULです。
書き込み側のEEWはSEWと同一であり、EMULはLMULと同一です。

..
  ----
  vzext.vf2 vd, vs2, vm  # Zero-extend SEW/2 source to SEW destination
  vsext.vf2 vd, vs2, vm  # Sign-extend SEW/2 source to SEW destination
  vzext.vf4 vd, vs2, vm  # Zero-extend SEW/4 source to SEW destination
  vsext.vf4 vd, vs2, vm  # Sign-extend SEW/4 source to SEW destination
  vzext.vf8 vd, vs2, vm  # Zero-extend SEW/8 source to SEW destination
  vsext.vf8 vd, vs2, vm  # Sign-extend SEW/8 source to SEW destination
  ----


::

  vzext.vf2 vd, vs2, vm  # SEW/2のソースオペランドをゼロ拡張してSEW幅化し書き込む
  vsext.vf2 vd, vs2, vm  # SEW/2のソースオペランドを符号拡張してSEW幅化し書き込む
  vzext.vf4 vd, vs2, vm  # SEW/4のソースオペランドをゼロ拡張してSEW幅化し書き込む
  vsext.vf4 vd, vs2, vm  # SEW/4のソースオペランドを符号拡張してSEW幅化し書き込む
  vzext.vf8 vd, vs2, vm  # SEW/8のソースオペランドをゼロ拡張してSEW幅化し書き込む
  vsext.vf8 vd, vs2, vm  # SEW/8のソースオペランドを符号拡張してSEW幅化し書き込む
  

..
  If the source EEW is not a supported width, or source EMUL would be
  below the minimum legal LMUL, the instruction encoding is reserved.

ソースのEEWがサポートされていない場合もしくはEMULがLMULの最小値よりも小さい場合、
命令エンコーディングは予約されています。


**************************************************************************
ベクトル整数 キャリー付き加算 / ボロー付き減算命令
**************************************************************************

..
  To support multi-word integer arithmetic, instructions that operate on
  a carry bit are provided.  For each operation (add or subtract), two
  instructions are provided: one to provide the result (SEW width), and
  the second to generate the carry output (single bit encoded as a mask
  boolean).


複数ワードの整数演算をサポートするために、キャリービットを操作する命令が用意されています。
各演算(加算または減算)には2つの命令が用意されています。
1つは演算結果(SEW幅)を提供し、
もう1つはキャリー出力(マスク・ブールとしてエンコードされた1ビット)を生成します。

..
  The carry inputs and outputs are represented using the mask register
  layout as described in Section :ref:`sec-mask-register-layout` .  Due to
  encoding constraints, the carry input must come from the implicit `v0`
  register, but carry outputs can be written to any vector register that
  respects the source/destination overlap restrictions.


キャリー入出力は、 <sec-mask-register-layout> 節で説明したマスクレジスタのレイアウトを用いて表現されます。
エンコーディングの制約により、キャリー入力は暗黙の `v0` レジスタから出力されなければなりませんが、
キャリー出力はソース/書き込みのオーバーラップの制約を満たす任意のベクトルレジスタに書き込むことができます。

..
  `vadc` and `vsbc` add or subtract the source operands and the carry-in or
  borrow-in, and write the result to vector register `vd`.
  These instructions are encoded as masked instructions (`vm=0`), but they operate
  on and write back all body elements.
  Encodings corresponding to the unmasked versions (`vm=1`) are reserved.


`vadc` と `vsbc` は、ソースオペランドとキャリーインまたはボローインの加算または減算を行い、
その結果をベクトルレジスタ `vd` に書き込みます。
これらの命令は、マスクされた命令 (`vm=0`) としてエンコードされていますが、すべてのボディ要素を操作し、書き戻します。
マスクされていないバージョン(`vm=1`)に対応するエンコーディングは予約されています。

..
  `vmadc` and `vmsbc` add or subtract the source operands, optionally
  add the carry-in or subtract the borrow-in if masked (`vm=0`), and
  write the result back to mask register `vd`.  If unmasked (`vm=1`),
  there is no carry-in or borrow-in.  These instructions operate on and
  write back all body elements, even if masked.  Because these
  instructions produce a mask value, they always operate with a
  tail-agnostic policy.


`vvmadc` と `vmsbc` は、ソースオペランドを加算または減算し、マスクされている (`vm=0`) 場合にはオプションでキャリーインを加算またはボローインを減算し、
その結果をマスクレジスタ `vd` に書き戻します。
マスクされていない場合 (`vm=1`)、キャリーインやボローインはありません。
これらの命令は、マスクされていても、すべてのボディ要素を操作し、書き戻します。
これらの命令はマスク値を生成するため、常に末尾Agnosticポリシで動作します。

..
  ----
   # Produce sum with carry.
  
   # vd[i] = vs2[i] + vs1[i] + v0.mask[i]
   vadc.vvm   vd, vs2, vs1, v0  # Vector-vector
  
   # vd[i] = vs2[i] + x[rs1] + v0.mask[i]
   vadc.vxm   vd, vs2, rs1, v0  # Vector-scalar
  
   # vd[i] = vs2[i] + imm + v0.mask[i]
   vadc.vim   vd, vs2, imm, v0  # Vector-immediate
  
   # Produce carry out in mask register format
  
   # vd.mask[i] = carry*out(vs2[i] + vs1[i] + v0.mask[i])
   vmadc.vvm   vd, vs2, vs1, v0  # Vector-vector
  
   # vd.mask[i] = carry*out(vs2[i] + x[rs1] + v0.mask[i])
   vmadc.vxm   vd, vs2, rs1, v0  # Vector-scalar
  
   # vd.mask[i] = carry*out(vs2[i] + imm + v0.mask[i])
   vmadc.vim   vd, vs2, imm, v0  # Vector-immediate
  
   # vd.mask[i] = carry*out(vs2[i] + vs1[i])
   vmadc.vv    vd, vs2, vs1      # Vector-vector, no carry-in
  
   # vd.mask[i] = carry*out(vs2[i] + x[rs1])
   vmadc.vx    vd, vs2, rs1      # Vector-scalar, no carry-in
  
   # vd.mask[i] = carry*out(vs2[i] + imm)
   vmadc.vi    vd, vs2, imm      # Vector-immediate, no carry-in
  ----


::

   # キャリー付き加算命令
  
   # vd[i] = vs2[i] + vs1[i] + v0.mask[i]
   vadc.vvm   vd, vs2, vs1, v0  # ベクトル-ベクトル
  
   # vd[i] = vs2[i] + x[rs1] + v0.mask[i]
   vadc.vxm   vd, vs2, rs1, v0  # ベクトル-スカラ
  
   # vd[i] = vs2[i] + imm + v0.mask[i]
   vadc.vim   vd, vs2, imm, v0  # ベクトル-即値
  
   # マスクレジスタフォーマットにキャリーアウトを生成する
  
   # vd.mask[i] = carry*out(vs2[i] + vs1[i] + v0.mask[i])
   vmadc.vvm   vd, vs2, vs1, v0  # ベクトル-ベクトル
  
   # vd.mask[i] = carry*out(vs2[i] + x[rs1] + v0.mask[i])
   vmadc.vxm   vd, vs2, rs1, v0  # ベクトル-スカラ
  
   # vd.mask[i] = carry*out(vs2[i] + imm + v0.mask[i])
   vmadc.vim   vd, vs2, imm, v0  # ベクトル-即値
  
   # vd.mask[i] = carry*out(vs2[i] + vs1[i])
   vmadc.vv    vd, vs2, vs1      # ベクトル-ベクトル, no carry-in
  
   # vd.mask[i] = carry*out(vs2[i] + x[rs1])
   vmadc.vx    vd, vs2, rs1      # ベクトル-スカラ, no carry-in
  
   # vd.mask[i] = carry*out(vs2[i] + imm)
   vmadc.vi    vd, vs2, imm      # ベクトル-即値, no carry-in
  

..
  Because implementing a carry propagation requires executing two
  instructions with unchanged inputs, destructive accumulations will
  require an additional move to obtain correct results.


キャリー伝搬を実装するには、入力が変更されていない状態で2つの命令を実行する必要があるため、
破壊的な蓄積を行うと、正しい結果を得るために追加の移動が必要になります。

..
  ----
    # Example multi-word arithmetic sequence, accumulating into v4
    vmadc.vvm v1, v4, v8, v0  # Get carry into temp register v1
    vadc.vvm v4, v4, v8, v0   # Calc new sum
    vmmv.m v0, v1             # Move temp carry into v0 for next word
  ----


::

    # v4に値を蓄積する複数ワード算術演算命令列
    vmadc.vvm v1, v4, v8, v0  # 一時レジスタv1にキャリーを格納する
    vadc.vvm v4, v4, v8, v0   # 加算を行う
    vmmv.m v0, v1             # 次のワードのために一時キャリーをv0に移動する
  

..
  The subtract with borrow instruction `vsbc` performs the equivalent
  function to support long word arithmetic for subtraction.  There are
  no subtract with immediate instructions.


ボロー付き減算命令 `vsbc` は、減算のためのロングワード演算をサポートするための機能を果たします。
即値での減算命令はありません。

::

   # ボロー付きの差分を計算する
  
   # vd[i] = vs2[i] - vs1[i] - v0.mask[i]
   vsbc.vvm   vd, vs2, vs1, v0  # Vector-vector
  
   # vd[i] = vs2[i] - x[rs1] - v0.mask[i]
   vsbc.vxm   vd, vs2, rs1, v0  # Vector-scalar
  
   # マスクレジスタフォーマットでボロー出力を生成する
  
   # vd.mask[i] = borrow*out(vs2[i] - vs1[i] - v0.mask[i])
   vmsbc.vvm   vd, vs2, vs1, v0  # Vector-vector
  
   # vd.mask[i] = borrow*out(vs2[i] - x[rs1] - v0.mask[i])
   vmsbc.vxm   vd, vs2, rs1, v0  # Vector-scalar
  
   # vd.mask[i] = borrow*out(vs2[i] - vs1[i])
   vmsbc.vv    vd, vs2, vs1      # Vector-vector, no borrow-in
  
   # vd.mask[i] = borrow*out(vs2[i] - x[rs1])
   vmsbc.vx    vd, vs2, rs1      # Vector-scalar, no borrow-in
  

..
  For `vmsbc`, the borrow is defined to be 1 iff the difference, prior to
  truncation, is negative.


`vmsbc` では、切り捨て前の差が負であるときに限りボローは1と定義される。

..
  For `vadc` and `vsbc`, the instruction encoding is reserved if the
  destination vector register is `v0`.


`vadc` と `vsbc` では書き込みベクトルレジスタが `v0` の場合、命令エンコーディングが予約されます。

..
  NOTE: This constraint corresponds to the constraint on masked vector
  operations that overwrite the mask register.


.. note::

  
*************************
ベクトル論理命令
*************************

..
  ----
  # Bitwise logical operations.
  vand.vv vd, vs2, vs1, vm   # Vector-vector
  vand.vx vd, vs2, rs1, vm   # vector-scalar
  vand.vi vd, vs2, imm, vm   # vector-immediate
  
  vor.vv vd, vs2, vs1, vm    # Vector-vector
  vor.vx vd, vs2, rs1, vm    # vector-scalar
  vor.vi vd, vs2, imm, vm    # vector-immediate
  
  vxor.vv vd, vs2, vs1, vm    # Vector-vector
  vxor.vx vd, vs2, rs1, vm    # vector-scalar
  vxor.vi vd, vs2, imm, vm    # vector-immediate
  ----


::

  # 論理命令
  vand.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vand.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vand.vi vd, vs2, imm, vm   # ベクトル-即値
  
  vor.vv vd, vs2, vs1, vm    # ベクトル-ベクトル
  vor.vx vd, vs2, rs1, vm    # ベクトル-スカラ
  vor.vi vd, vs2, imm, vm    # ベクトル-即値
  
  vxor.vv vd, vs2, vs1, vm    # ベクトル-ベクトル
  vxor.vx vd, vs2, rs1, vm    # ベクトル-スカラ
  vxor.vi vd, vs2, imm, vm    # ベクトル-即値
  

..
  NOTE: With an immediate of -1, scalar-immediate forms of the `vxor`
  instruction provide a bitwise NOT operation.  This can be provided as
  an assembler pseudoinstruction `vnot.v`.


.. note::

  これは、アセンブラの疑似命令 `vnot.v` として提供することができます。
  

**********************************************
ベクトル単一幅ビットシフト命令
**********************************************

..
  A full complement of vector shift instructions are provided, including
  logical shift left, and logical (zero-extending) and arithmetic
  (sign-extending) shift right.  The data to be shifted is in the vector
  register group specified by `vs2` and the shift amount can be a vector
  register group `vs1`, a scalar integer register `rs1`, or an
  immediate.  The low lg2(SEW) bits of the vector or scalar shift-amount
  value are used, and shift-amount immediates are zero-extended.


左への論理シフト、右への論理(ゼロ拡張)および算術(符号拡張)シフトを含む、完全なベクトルシフト命令が用意されています。
シフトするデータは、 `vs2` で指定されたベクトルレジスタグループにあり、シフト量はベクトルレジスタグループ `vs1` 、スカラ整数レジスタ `rs1` 、または即値となります。
ベクトルまたはスカラのシフト量の値の下位lg2(SEW)ビットが使用され、シフト量の即値はゼロ拡張されます。

..
  ----
  # Bit shift operations
  vsll.vv vd, vs2, vs1, vm   # Vector-vector
  vsll.vx vd, vs2, rs1, vm   # vector-scalar
  vsll.vi vd, vs2, uimm, vm   # vector-immediate
  
  vsrl.vv vd, vs2, vs1, vm   # Vector-vector
  vsrl.vx vd, vs2, rs1, vm   # vector-scalar
  vsrl.vi vd, vs2, uimm, vm   # vector-immediate
  
  vsra.vv vd, vs2, vs1, vm   # Vector-vector
  vsra.vx vd, vs2, rs1, vm   # vector-scalar
  vsra.vi vd, vs2, uimm, vm   # vector-immediate
  ----


::

  # ビットシフト操作
  vsll.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vsll.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vsll.vi vd, vs2, uimm, vm   # ベクトル-即値
  
  vsrl.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vsrl.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vsrl.vi vd, vs2, uimm, vm   # ベクトル-即値
  
  vsra.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vsra.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vsra.vi vd, vs2, uimm, vm   # ベクトル-即値
  

*******************************************************
ベクトルサイズ幅縮退整数右シフト命令
*******************************************************

..
  The narrowing right shifts extract a smaller field from a wider
  operand and have both zero-extending (`srl`) and sign-extending
  (`sra`) forms.  The shift amount can come from a vector or a scalar
  `x` register or a 5-bit immediate.  The low lg2(2*SEW) bits of the
  vector or scalar shift-amount value are used (e.g., the low 6 bits for
  a SEW=64-bit to SEW=32-bit narrowing operation).  The immediate forms
  zero-extend their shift-amount immediate operand.


サイズ幅縮退の右シフトは、広いオペランドからより小さいフィールドを取り出すもので、ゼロ拡張(`srl`)と符号拡張(`sra`)の両方の形式があります。
シフト量は、ベクトル、スカラの `x` レジスタ、または5ビットの即値から得られます。
ベクトルまたはスカラのシフト量の下位lg2(2*SEW)ビットが使用されます(例えば、SEW=64ビットからSEW=32ビットへの狭帯域化の場合は下位6ビット)。
即値形式は、シフト量の即値オペランドをゼロ拡張します。

..
  ----
   # Narrowing shift right logical, SEW = (2*SEW) >> SEW
   vnsrl.wv vd, vs2, vs1, vm   # vector-vector
   vnsrl.wx vd, vs2, rs1, vm   # vector-scalar
   vnsrl.wi vd, vs2, uimm, vm   # vector-immediate
  
   # Narrowing shift right arithmetic, SEW = (2*SEW) >> SEW
   vnsra.wv vd, vs2, vs1, vm   # vector-vector
   vnsra.wx vd, vs2, rs1, vm   # vector-scalar
   vnsra.wi vd, vs2, uimm, vm   # vector-immediate
  ----


::

   # Narrowing shift right logical, SEW = (2*SEW) >> SEW
   # ビット幅縮退論理右シフト命令, SEW = (2*SEW) >> SEW
   vnsrl.wv vd, vs2, vs1, vm   # vector-vector
   vnsrl.wx vd, vs2, rs1, vm   # vector-scalar
   vnsrl.wi vd, vs2, uimm, vm   # vector-immediate
  
   # Narrowing shift right arithmetic, SEW = (2*SEW) >> SEW
   # ビット幅縮退算術右シフト命令, SEW = (2*SEW) >> SEW
   vnsra.wv vd, vs2, vs1, vm   # vector-vector
   vnsra.wx vd, vs2, rs1, vm   # vector-scalar
   vnsra.wi vd, vs2, uimm, vm   # vector-immediate
  

..
  NOTE: It could be useful to add support for `n4` variants, where the
  destination is 1/4 width of source.


.. note::

  
..
  NOTE: An integer value can be halved in width using the narrowing integer
  shift instructions with a scalar operand of x0. Can define assembly
  pseudoinstructions `vncvt.x.x.w vd,vs,vm` = `vnsrl.wx vd,vs,x0,vm`.


.. note::

  アセンブリ疑似命令 `vncvt.x.x.w vd,vs,vm` = `vnsrl.wx vd,vs,x0,vm` を定義することができます。
  

*******************************
ベクトル整数比較命令
*******************************

..
  The following integer compare instructions write 1 to the destination
  mask register element if the comparison evaluates to true, and 0
  otherwise.  The destination mask vector is always held in a single
  vector register, with a layout of elements as described in Section
  :ref:`sec-mask-register-layout` .  The destination mask vector register
  may be the same as the source vector mask register (`v0`).


以下の整数比較命令は、比較の結果が真であれば書き込みマスクレジスタの要素に1を、そうでなければ0を書き込みます。
書き込みマスクベクトルは常に単一のベクトルレジスタに保持され、その要素のレイアウトはセクション :ref:`sec-mask-register-layout`  で説明されています。
書き込みマスクレジスタは、ソースベクトルのマスクレジスタ(`v0`)と同じものでも構いません。

..
  ----
  # Set if equal
  vmseq.vv vd, vs2, vs1, vm  # Vector-vector
  vmseq.vx vd, vs2, rs1, vm  # vector-scalar
  vmseq.vi vd, vs2, imm, vm  # vector-immediate
  
  # Set if not equal
  vmsne.vv vd, vs2, vs1, vm  # Vector-vector
  vmsne.vx vd, vs2, rs1, vm  # vector-scalar
  vmsne.vi vd, vs2, imm, vm  # vector-immediate
  
  # Set if less than, unsigned
  vmsltu.vv vd, vs2, vs1, vm  # Vector-vector
  vmsltu.vx vd, vs2, rs1, vm  # Vector-scalar
  
  # Set if less than, signed
  vmslt.vv vd, vs2, vs1, vm  # Vector-vector
  vmslt.vx vd, vs2, rs1, vm  # vector-scalar
  
  # Set if less than or equal, unsigned
  vmsleu.vv vd, vs2, vs1, vm   # Vector-vector
  vmsleu.vx vd, vs2, rs1, vm   # vector-scalar
  vmsleu.vi vd, vs2, imm, vm   # Vector-immediate
  
  # Set if less than or equal, signed
  vmsle.vv vd, vs2, vs1, vm  # Vector-vector
  vmsle.vx vd, vs2, rs1, vm  # vector-scalar
  vmsle.vi vd, vs2, imm, vm  # vector-immediate
  
  # Set if greater than, unsigned
  vmsgtu.vx vd, vs2, rs1, vm   # Vector-scalar
  vmsgtu.vi vd, vs2, imm, vm   # Vector-immediate
  
  # Set if greater than, signed
  vmsgt.vx vd, vs2, rs1, vm    # Vector-scalar
  vmsgt.vi vd, vs2, imm, vm    # Vector-immediate
  
  # Following two instructions are not provided directly
  # Set if greater than or equal, unsigned
  # vmsgeu.vx vd, vs2, rs1, vm    # Vector-scalar
  # Set if greater than or equal, signed
  # vmsge.vx vd, vs2, rs1, vm    # Vector-scalar
  ----


::

  # Set if equal
  vmseq.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
  vmseq.vx vd, vs2, rs1, vm  # ベクトル-スカラ
  vmseq.vi vd, vs2, imm, vm  # ベクトル-即値
  
  # Set if not equal
  vmsne.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
  vmsne.vx vd, vs2, rs1, vm  # ベクトル-スカラ
  vmsne.vi vd, vs2, imm, vm  # ベクトル-即値
  
  # Set if less than, unsigned
  vmsltu.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
  vmsltu.vx vd, vs2, rs1, vm  # ベクトル-スカラ
  
  # Set if less than, signed
  vmslt.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
  vmslt.vx vd, vs2, rs1, vm  # ベクトル-スカラ
  
  # Set if less than or equal, unsigned
  vmsleu.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vmsleu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vmsleu.vi vd, vs2, imm, vm   # ベクトル-即値
  
  # Set if less than or equal, signed
  vmsle.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
  vmsle.vx vd, vs2, rs1, vm  # ベクトル-スカラ
  vmsle.vi vd, vs2, imm, vm  # ベクトル-即値
  
  # Set if greater than, unsigned
  vmsgtu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vmsgtu.vi vd, vs2, imm, vm   # ベクトル-即値
  
  # Set if greater than, signed
  vmsgt.vx vd, vs2, rs1, vm    # ベクトル-スカラ
  vmsgt.vi vd, vs2, imm, vm    # ベクトル-即値
  
  # 以下の2命令は直接提供されない
  # Set if greater than or equal, unsigned
  # vmsgeu.vx vd, vs2, rs1, vm    # ベクトル-スカラ
  # Set if greater than or equal, signed
  # vmsge.vx vd, vs2, rs1, vm    # ベクトル-スカラ
  

..
  The following table indicates how all comparisons are implemented in
  native machine code.


以下の表は全ての比較操作がどのようにネイティブマシンコードに実装されるのかを示しています。

..
  ----
  Comparison      Assembler Mapping             Assembler Pseudoinstruction
  
  va < vb         vmslt{u}.vv vd, va, vb, vm
  va <= vb        vmsle{u}.vv vd, va, vb, vm
  va > vb         vmslt{u}.vv vd, vb, va, vm    vmsgt{u}.vv vd, va, vb, vm
  va >= vb        vmsle{u}.vv vd, vb, va, vm    vmsge{u}.vv vd, va, vb, vm
  
  va < x          vmslt{u}.vx vd, va, x, vm
  va <= x         vmsle{u}.vx vd, va, x, vm
  va > x          vmsgt{u}.vx vd, va, x, vm
  va >= x         see below
  
  va < i          vmsle{u}.vi vd, va, i-1, vm    vmslt{u}.vi vd, va, i, vm
  va <= i         vmsle{u}.vi vd, va, i, vm
  va > i          vmsgt{u}.vi vd, va, i, vm
  va >= i         vmsgt{u}.vi vd, va, i-1, vm    vmsge{u}.vi vd, va, i, vm
  
  va, vb vector register groups
  x      scalar integer register
  i      immediate
  ----


::

  比較            アセンブラマッピング          アセンブラ疑似命令
  
  va < vb         vmslt{u}.vv vd, va, vb, vm
  va <= vb        vmsle{u}.vv vd, va, vb, vm
  va > vb         vmslt{u}.vv vd, vb, va, vm    vmsgt{u}.vv vd, va, vb, vm
  va >= vb        vmsle{u}.vv vd, vb, va, vm    vmsge{u}.vv vd, va, vb, vm
  
  va < x          vmslt{u}.vx vd, va, x, vm
  va <= x         vmsle{u}.vx vd, va, x, vm
  va > x          vmsgt{u}.vx vd, va, x, vm
  va >= x         see below
  
  va < i          vmsle{u}.vi vd, va, i-1, vm    vmslt{u}.vi vd, va, i, vm
  va <= i         vmsle{u}.vi vd, va, i, vm
  va > i          vmsgt{u}.vi vd, va, i, vm
  va >= i         vmsgt{u}.vi vd, va, i-1, vm    vmsge{u}.vi vd, va, i, vm
  
  va, vb ベクトルレジスタグループ
  x      スカラ整数レジスタ
  i      即値
  

..
  NOTE: The immediate forms of `vmslt{u}.vi` are not provided as the
  immediate value can be decreased by 1 and the `vmsle{u}.vi` variants
  used instead.  The `vmsle.vi` range is -16 to 15, resulting in an
  effective `vmslt.vi` range of -15 to 16.  The `vmsleu.vi` range is 0
  to 15 giving an effective `vmsltu.vi` range of 1 to 16 (Note,
  `vmsltu.vi` with immediate 0 is not useful as it is always
  false). Because the 5-bit vector immediates are always sign-extended,
  `vmsleu.vi` also supports unsigned immediate values in the range
  `2^SEW^-16` to `2^SEW^-1`, allowing corresponding `vmsltu.vi`
  comparisons against unsigned immediates in the range `2^SEW^-15` to
  `2^SEW^`.  Note that `vlsltu.vi` with immediate `2^SEW^` is not useful
  as it is always true.


.. note::

  `vmsle.vi` の範囲は -16 から 15 で、その結果、実効的な `vmslt.vi` の範囲は -15 から 16 となります。
  `vmsleu.vi` の範囲は0から15で、実効的な `vmsltu.vi` の範囲は1から16になります (注意: 即値0の `vmsltu.vi` は常に偽なので役に立ちません)。
  5ビットのベクトル即値は常に符号拡張されているので、`vmsleu.vi` は `2^SEW^-16` から `2^SEW^-1` の範囲の符号なし即値もサポートしており、
  `2^SEW^-15` から `2^SEW^` の範囲の符号なし即値に対する `vmsltu.vi` の比較が可能です。
  即値 `2^SEW^` の `vlsltu.vi` は常に真なので、有用ではないことに注意してください。
  
..
  Similarly, `vmsge{u}.vi` is not provided and the comparison is
  implemented using `vmsgt{u}.vi` with the immediate decremented by one.
  The resulting effective `vmsge.vi` range is -15 to 16, and the
  resulting effective `vmsgeu.vi` range is 1 to 16 (Note, `vmsgeu.vi` with
  immediate 0 is not useful as it is always true).


同様に、`vmsge{u}.vi` は提供されず、即値を1だけデクリメントした `vmsgt{u}.vi` を使用して比較を実装します。
その結果、実効的な `vmsge.vi` の範囲は-15～16、実効的な `vmsgeu.vi` の範囲は1～16となります(なお、即値が0の `vmsgeu.vi` は常に真なので役に立ちません)。

..
  NOTE: The `vmsgt` forms for register scalar and immediates are provided
  to allow a single comparison instruction to provide the correct
  polarity of mask value without using additional mask logical
  instructions.


.. note::

  1つの比較命令でマスク値の正しい極性を提供できるようにするために提供されています。
  
..
  To reduce encoding space, the `vmsge{u}.vx` form is not directly
  provided, and so the `va {ge} x` case requires special treatment.


エンコーディング空間を減らすために、`vmsge{u}.vx` 形式は直接提供されていないので、`va {ge} x` の場合は特別な処理が必要です。

..
  NOTE: The `vmsge{u}.vx` could potentially be encoded in a
  non-orthogonal way under the unused OPIVI variant of `vmslt{u}`.  These
  would be the only instructions in OPIVI that use a scalar `x`register
  however.  Alternatively, a further two funct6 encodings could be used,
  but these would have a different operand format (writes to mask
  register) than others in the same group of 8 funct6 encodings.  The
  current PoR is to omit these instructions and to synthesize where
  needed as described below.


.. note::

  しかし、OPIVIでスカラの `x` レジスタを使用する命令はこれらだけです。
  別の方法として、さらに2つのfunct6エンコーディングを使用することもできますが、
  これらは同じ8つのfunct6エンコーディングのグループの他のものとは異なるオペランドフォーマット(マスクレジスタへの書き込み)になります。
  現在のPoRでは、これらの命令を省略し、必要に応じて以下のように合成しています。
  
..
  The `vmsge{u}.vx` operation can be synthesized by reducing the
  value of `x` by 1 and using the `vmsgt{u}.vx` instruction, when it is
  known that this will not underflow the representation in `x`.


`vmsge{u}.vx` の演算は、 `x` の表現がアンダーフローしないことがわかっている場合、 `x` の値を1だけ減らして `vmsgt{u}.vx` 命令を使用することで合成できます。

..
  ----
  Sequences to synthesize `vmsge{u}.vx` instruction
  
  va >= x,  x > minimum
  
     addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm
  ----


::

  `vmsge{u}.vx` 命令を合成するための命令列
  
  va >= x,  x > minimum
  
     addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm
  

..
  The above sequence will usually be the most efficient implementation,
  but assembler pseudoinstructions can be provided for cases where the
  range of `x` is unknown.


通常は上記の順序が最も効率的な実装になりますが、 `x` の範囲が不明な場合にはアセンブラの疑似命令を提供することができます。

..
  ----
  unmasked va >= x
  
    pseudoinstruction: vmsge{u}.vx vd, va, x
    expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd
  
  masked va >= x, vd != v0
  
    pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t
    expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
  
  masked va >= x, vd == v0
  
    pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt
    expansion: vmslt{u}.vx vt, va, x;  vmandnot.mm vd, vd, vt
  
  masked va >= x, any vd
  
    pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt
    expansion: vmslt{u}.vx vt, va, x;  vmandnot.mm vt, v0, vt;  vmandnot.mm vd, vd, v0;  vmor.mm vd, vt, vd
  
    The vt argument to the pseudoinstruction must name a temporary vector register that is
    not same as vd and which will be clobbered by the pseudoinstruction
  ----


::

  マスク無し va >= x
  
    疑似命令: vmsge{u}.vx vd, va, x
    展開: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd
  
  マスク付き va >= x, vd != v0
  
    疑似命令: vmsge{u}.vx vd, va, x, v0.t
    展開: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
  
  マスク付き va >= x, vd == v0
  
    疑似命令: vmsge{u}.vx vd, va, x, v0.t, vt
    展開: vmslt{u}.vx vt, va, x;  vmandnot.mm vd, vd, vt
  
  マスク付き va >= x, any vd
  
    疑似命令: vmsge{u}.vx vd, va, x, v0.t, vt
    展開: vmslt{u}.vx vt, va, x;  vmandnot.mm vt, v0, vt;  vmandnot.mm vd, vd, v0;  vmor.mm vd, vt, vd
  
    疑似命令中のvt引数は一時ベクトルレジスタの意味であり、
    vdとは異なり破壊される問題ないレジスタでなければならない
  

..
  Comparisons effectively AND in the mask under a mask-undisturbed policy e.g,

マスク付きundisturbedポリシによって複数の比較をANDする例を以下に示します。

..
  ----
      # (a < b) && (b < c) in two instructions when mask-undisturbed
      vmslt.vv    v0, va, vb        # All body elements written
      vmslt.vv    v0, vb, vc, v0.t  # Only update at set mask
  ----


::

      # (a < b) && (b < c) をマスク付きundisturbedによって2命令で実現する
      vmslt.vv    v0, va, vb        # 全ての要素に書き込みを行う
      vmslt.vv    v0, vb, vc, v0.t  # マスクの設定された場所のみ書き込みを行う
  

..
  Comparisons write mask registers, and so always operate under a
  tail-agnostic policy.


比較はマスクレジスタを書き込むため、常に末尾agnosticポリシで動作します。


**************************************
ベクトル整数最大/最小命令
**************************************

..
  Signed and unsigned integer minimum and maximum instructions are
  supported.

符号付き整数/符号なし整数の最大・最小値計算命令がサポートされている。

..
  ----
  # Unsigned minimum
  vminu.vv vd, vs2, vs1, vm   # Vector-vector
  vminu.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Signed minimum
  vmin.vv vd, vs2, vs1, vm   # Vector-vector
  vmin.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Unsigned maximum
  vmaxu.vv vd, vs2, vs1, vm   # Vector-vector
  vmaxu.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Signed maximum
  vmax.vv vd, vs2, vs1, vm   # Vector-vector
  vmax.vx vd, vs2, rs1, vm   # vector-scalar
  ----


::

  # 符号なし最小値
  vminu.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vminu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
  # 符号付き最小値
  vmin.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vmin.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
  # 符号なし最大値
  vmaxu.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vmaxu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
  # 符号付き最大値
  vmax.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vmax.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  

****************************************
ベクトル単一幅整数乗算命令
****************************************

..
  The single-width multiply instructions perform a SEW-bit*SEW-bit
  multiply and return an SEW-bit-wide result.  The `*mulh*` versions
  write the high word of the product to the destination register.


単一幅の乗算命令は、SEWビット*SEWビットの乗算を行い、SEWビット幅の結果を返す。
`*mulh*` 版では、積の上位ワードを出力レジスタに書き込みます。

..
  ----
  # Signed multiply, returning low bits of product
  vmul.vv vd, vs2, vs1, vm   # Vector-vector
  vmul.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Signed multiply, returning high bits of product
  vmulh.vv vd, vs2, vs1, vm   # Vector-vector
  vmulh.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Unsigned multiply, returning high bits of product
  vmulhu.vv vd, vs2, vs1, vm   # Vector-vector
  vmulhu.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Signed(vs2)-Unsigned multiply, returning high bits of product
  vmulhsu.vv vd, vs2, vs1, vm   # Vector-vector
  vmulhsu.vx vd, vs2, rs1, vm   # vector-scalar
  ----


::

  # 符号付き乗算、積の下位ビットを返す
  vmul.vv vd, vs2, vs1, vm   # Vector-vector
  vmul.vx vd, vs2, rs1, vm   # vector-scalar
  
  # 符号付き乗算、積の上位ビットを返す
  vmulh.vv vd, vs2, vs1, vm   # Vector-vector
  vmulh.vx vd, vs2, rs1, vm   # vector-scalar
  
  # 符号なし乗算、積の上位ビットを返す
  vmulhu.vv vd, vs2, vs1, vm   # Vector-vector
  vmulhu.vx vd, vs2, rs1, vm   # vector-scalar
  
  # vs2を符号付き、vs1を符号なしとした乗算、積の上位ビットを返す
  vmulhsu.vv vd, vs2, vs1, vm   # Vector-vector
  vmulhsu.vx vd, vs2, rs1, vm   # vector-scalar
  

..
  NOTE: There is no `vmulhus` opcode to return high half of
  unsigned-vector * signed-scalar product.


.. note::

  
..
  NOTE: The current `vmulh*` opcodes perform simple fractional
  multiplies, but with no option to scale, round, and/or saturate the
  result.  A possible extension can consider variants of `vmulh`,
  `vmulhu`, `vmulhsu` that use the `vxrm` rounding mode when discarding
  low half of product.  There is no possibility of overflow in these
  cases.


.. note::

  拡張機能としては、`vmulh`, `vmulhu`, `vmulhsu` のバリエーションが考えられ、積の下半分を破棄する際に `vxrm` 丸めモードを使用します。
  これらのケースではオーバーフローの可能性はありません。
  

*******************************
ベクトル整数除算命令
*******************************

..
  The divide and remainder instructions are equivalent to the RISC-V
  standard scalar integer multiply/divides, with the same results for
  extreme inputs.


除算・剰余命令は、RISC-V標準のスカラ整数乗除算と同等で、極端な入力に対しても同じ結果が得られます。

..
  ----
      # Unsigned divide.
      vdivu.vv vd, vs2, vs1, vm   # Vector-vector
      vdivu.vx vd, vs2, rs1, vm   # vector-scalar
  
      # Signed divide
      vdiv.vv vd, vs2, vs1, vm   # Vector-vector
      vdiv.vx vd, vs2, rs1, vm   # vector-scalar
  
      # Unsigned remainder
      vremu.vv vd, vs2, vs1, vm   # Vector-vector
      vremu.vx vd, vs2, rs1, vm   # vector-scalar
  
      # Signed remainder
      vrem.vv vd, vs2, vs1, vm   # Vector-vector
      vrem.vx vd, vs2, rs1, vm   # vector-scalar
  ----


::

      # 符号なし除算
      vdivu.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vdivu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
      # 符号付除算
      vdiv.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vdiv.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
      # 符号なし剰余
      vremu.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vremu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
      # 符号付剰余
      vrem.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vrem.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  

..
  NOTE: The decision to include integer divide and remainder was
  contentious. The argument in favor is that without a standard
  instruction, software would have to pick some algorithm to perform the
  operation, which would likely perform poorly on some
  microarchitectures versus others.


.. note::

  標準的な命令がない場合、ソフトウェアは演算を実行するために何らかのアルゴリズムを選択しなければならず、
  マイクロアーキテクチャによってはパフォーマンスが低下する可能性があるというのが賛成の理由です。
  
..
  NOTE: There is no instruction to perform a "scalar divide by vector"
  operation.


.. note::

  
**********************************
ベクトル幅拡張乗算命令
**********************************

..
  The widening integer multiply instructions return the full 2*SEW-bit
  product from an SEW-bit*SEW-bit multiply.


幅拡張整数乗算命令は、SEWビット*SEWビット乗算から完全な2*SEWビット積を返します。

..
  ----
  # Widening signed-integer multiply
  vwmul.vv  vd, vs2, vs1, vm # vector-vector
  vwmul.vx  vd, vs2, rs1, vm # vector-scalar
  
  # Widening unsigned-integer multiply
  vwmulu.vv vd, vs2, vs1, vm # vector-vector
  vwmulu.vx vd, vs2, rs1, vm # vector-scalar
  
  # Widening signed-unsigned integer multiply
  vwmulsu.vv vd, vs2, vs1, vm # vector-vector
  vwmulsu.vx vd, vs2, rs1, vm # vector-scalar
  ----


::

  # 幅拡張符号付き整数乗算
  vwmul.vv  vd, vs2, vs1, vm # vector-vector
  vwmul.vx  vd, vs2, rs1, vm # vector-scalar
  
  # 幅拡張符号なし整数乗算
  vwmulu.vv vd, vs2, vs1, vm # vector-vector
  vwmulu.vx vd, vs2, rs1, vm # vector-scalar
  
  # 幅拡張符号付き・符号なし整数乗算
  vwmulsu.vv vd, vs2, vs1, vm # vector-vector
  vwmulsu.vx vd, vs2, rs1, vm # vector-scalar
  

**********************************************
ベクトル単一幅整数乗算加算命令
**********************************************

..
  The integer multiply-add instructions are destructive and are provided
  in two forms, one that overwrites the addend or minuend
  (`vmacc`, `vnmsac`) and one that overwrites the first multiplicand
  (`vmadd`, `vnmsub`).

整数の乗算加算命令は破壊的で、加算値や最小値を上書きするもの(`vmacc`、`vnmsac`)と、
最初の乗算値を上書きするもの(`vmadd`、`vnmsub`)の2種類が用意されています。

..
  The low half of the product is added or subtracted from the third operand.

積の下位ビットの半分が第3オペランドに加算または減算されます。

..
  NOTE: `sac` is intended to be read as "subtract from accumulator". The
  opcode is `vnmsac` to match the (unfortunately counterintuitive)
  floating-point `fnmsub` instruction definition.  Similarly for the
  `vnmsub` opcode.

.. note::

  オペコードは、(残念ながら直感に反する)浮動小数点の `fnmsub` 命令の定義に合わせて `vnmsac` となっています。
  オペコードは `vnmsub` と似ています。
  
..
  ----
  # Integer multiply-add, overwrite addend
  vmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vmacc.vx vd, rs1, vs2, vm    # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
  
  # Integer multiply-sub, overwrite minuend
  vnmsac.vv vd, vs1, vs2, vm    # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
  vnmsac.vx vd, rs1, vs2, vm    # vd[i] = -(x[rs1] * vs2[i]) + vd[i]
  
  # Integer multiply-add, overwrite multiplicand
  vmadd.vv vd, vs1, vs2, vm    # vd[i] = (vs1[i] * vd[i]) + vs2[i]
  vmadd.vx vd, rs1, vs2, vm    # vd[i] = (x[rs1] * vd[i]) + vs2[i]
  
  # Integer multiply-sub, overwrite multiplicand
  vnmsub.vv vd, vs1, vs2, vm    # vd[i] = -(vs1[i] * vd[i]) + vs2[i]
  vnmsub.vx vd, rs1, vs2, vm    # vd[i] = -(x[rs1] * vd[i]) + vs2[i]
  ----


::

  # 整数乗算加算命令、加算項上書き
  vmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vmacc.vx vd, rs1, vs2, vm    # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
  
  # 整数乗算減算命令、減算項上書き
  vnmsac.vv vd, vs1, vs2, vm    # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
  vnmsac.vx vd, rs1, vs2, vm    # vd[i] = -(x[rs1] * vs2[i]) + vd[i]
  
  # 整数乗算加算命令、乗算項上書き
  vmadd.vv vd, vs1, vs2, vm    # vd[i] = (vs1[i] * vd[i]) + vs2[i]
  vmadd.vx vd, rs1, vs2, vm    # vd[i] = (x[rs1] * vd[i]) + vs2[i]
  
  # 整数乗算減算命令、乗算項上書き
  vnmsub.vv vd, vs1, vs2, vm    # vd[i] = -(vs1[i] * vd[i]) + vs2[i]
  vnmsub.vx vd, rs1, vs2, vm    # vd[i] = -(x[rs1] * vd[i]) + vs2[i]
  

**********************************************
ベクトル幅拡張整数乗算加算命令
**********************************************

..
  The widening integer multiply-add instructions add the full 2*SEW-bit
  product from a SEW-bit*SEW-bit multiply to a 2*SEW-bit value and
  produce a 2*SEW-bit result.  All combinations of signed and unsigned
  multiply operands are supported.

幅拡張整数乗算・加算命令は、SEWビット*SEWビットの乗算から2*SEWビットの値に2*SEWビットの乗算を加算し、
2*SEWビットの結果を生成します。
符号付きおよび符号なしの乗算オペランドのすべての組み合わせがサポートされています。

..
  ----
  # Widening unsigned-integer multiply-add, overwrite addend
  vwmaccu.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vwmaccu.vx vd, rs1, vs2, vm    # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
  
  # Widening signed-integer multiply-add, overwrite addend
  vwmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vwmacc.vx vd, rs1, vs2, vm    # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
  
  # Widening signed-unsigned-integer multiply-add, overwrite addend
  vwmaccsu.vv vd, vs1, vs2, vm  # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i]
  vwmaccsu.vx vd, rs1, vs2, vm  # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i]
  
  # Widening unsigned-signed-integer multiply-add, overwrite addend
  vwmaccus.vx vd, rs1, vs2, vm  # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i]
  ----


::

  # 幅拡張符号なし整数乗算加算命令、加算項上書き
  vwmaccu.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vwmaccu.vx vd, rs1, vs2, vm    # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
  
  # 幅拡張符号付き整数乗算加算命令、加算項上書き
  vwmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vwmacc.vx vd, rs1, vs2, vm    # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
  
  # 幅拡張符号なし整数乗算加算命令、加算項上書き
  vwmaccsu.vv vd, vs1, vs2, vm  # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i]
  vwmaccsu.vx vd, rs1, vs2, vm  # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i]
  
  # Widening unsigned-signed-integer multiply-add, overwrite addend
  # 幅拡張符号なし・符号付き整数乗算加算命令、加算項上書き
  vwmaccus.vx vd, rs1, vs2, vm  # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i]
  

**********************************
ベクトル整数マージ命令
**********************************

..
  The vector integer merge instructions combine two source operands
  based on a mask.  Unlike regular arithmetic instructions, the
  merge operates on all body elements (i.e., the set of elements from
  `vstart` up to the current vector length in `vl`).

ベクトル整数マージ命令は、2つのソースオペランドをマスクに基づいて結合します。
通常の算術命令とは異なり、マージはすべてのボディ要素(すなわち、 `vstart` から `vl` の現在のベクトル長までの要素の集合)に対して動作します。

..
  The `vmerge` instructions are encoded as masked instructions (`vm=0`).
  The instructions combine two
  sources as follows.  At elements where the mask value is zero, the
  first operand is copied to the destination element, otherwise the
  second operand is copied to the destination element.  The first
  operand is always a vector register group specified by `vs2`.  The
  second operand is a vector register group specified by `vs1` or a
  scalar `x` register specified by `rs1` or a 5-bit sign-extended
  immediate.


`vmerge` 命令は、マスクされた命令(`vm=0`)としてエンコードされます。
この命令は、次のように2つのソースを組み合わせます。
マスク値がゼロの要素では、第1オペランドが書き込み要素にコピーされ、そうでない場合は第2オペランドが書き込み要素にコピーされます。
第1オペランドは常に `vs2` で指定されるベクトルレジスタ群です。
第2オペランドは、 `vs1` で指定されるベクトルレジスタ群、
`rs1` で指定されるスカラ `x` レジスタ、または5ビットの符号拡張即値です。

::

  vmerge.vvm vd, vs2, vs1, v0  # vd[i] = v0.mask[i] ? vs1[i] : vs2[i]
  vmerge.vxm vd, vs2, rs1, v0  # vd[i] = v0.mask[i] ? x[rs1] : vs2[i]
  vmerge.vim vd, vs2, imm, v0  # vd[i] = v0.mask[i] ? imm    : vs2[i]
  

*******************************
ベクトル整数移動命令
*******************************

..
  The vector integer move instructions copy a source operand to a vector
  register group.
  The `vmv.v.v` variant copies a vector register group, whereas the `vmv.v.x`
  and `vmv.v.i` variants **splat** a scalar register or immediate to all active
  elements of the destination vector register group.
  These instructions are encoded as unmasked instructions (`vm=1`).
  The first operand specifier (`vs2`) must contain `v0`, and any other vector
  register number in `vs2` is *reserved*.


ベクトル整数移動命令は、ソース・オペランドをベクトル・レジスタ・グループにコピーします。
`vmv.v.v` はベクトルレジスタ群をコピーしますが、 `vmv.v.x` と `vmv.v.i` はスカラレジスタまたは即値をコピー先のベクトルレジスタ群のすべてのアクティブな要素に **転送**します。
これらの命令は、マスクされていない命令(`vm=1`)としてエンコードされます。
第1オペランド指定子(`vs2`)には、 `v0` が含まれていなければならず、 `vs2` に含まれるその他のベクトルレジスタ番号は *予約されています* 。

::

  vmv.v.v vd, vs1 # vd[i] = vs1[i]
  vmv.v.x vd, rs1 # vd[i] = x[rs1]
  vmv.v.i vd, imm # vd[i] = imm
  

..
  NOTE: Mask values can be widened into SEW-width elements using a
  sequence `vmv.v.i vd, 0; vmerge.vim vd, vd, 1, v0`.

.. note::

  
..
  NOTE: The vector integer move instructions share the encoding with the vector
  merge instructions, but with `vm=1` and `vs2=v0`.

ベクトル整数移動命令は、ベクトルマージ命令とエンコーディングが同じですが、 `vm=1` 、 `vs2=v0` となっています。

..
  The form `vmv.v.v vd, vd`, which leaves body elements unchanged,
  is used as a hint to indicate that the register will next be used
  with an EEW equal to SEW.

ボディ要素を変更しない `vmv.v.vd, vd` という形式は、そのレジスタが次に SEW と等しい EEW で使用されることを示すヒントとして使用されます。

..
  NOTE: Implementations that internally reorganize data according to EEW
  can shuffle the internal representation according to SEW.
  Implementations that do not internally reorganize data can dynamically
  elide this instruction, and treat as a NOP.

.. note::

  SEWに従って内部表現をシャッフルすることができます。
  データを内部的に再編成しない実装では、この命令を動的に省略し、NOP として扱うことができます。
  
.. _sec-vector-fixed-point:

##############################################
ベクトル固定小数点算術演算命令
##############################################

..
  The preceding set of integer arithmetic instructions is extended to support
  fixed-point arithmetic.

前述の整数演算命令群を拡張し、固定小数点演算がサポートされています。

..
  A fixed-point number is a two's-complement signed or unsigned integer
  interpreted as the numerator in a fraction with an implicit denominator.
  The fixed-point instructions are intended to be applied to the numerators;
  it is the responsibility of software to manage the denominators.
  An N-bit element can hold two's-complement signed integers in the
  range -2^N-1^...+2^N-1^-1, and unsigned integers in the range 0
    + +2^N^-1.  The fixed-point instructions help preserve precision in
  narrow operands by supporting scaling and rounding, and can handle
  overflow by saturating results into the destination format range.


固定小数点数とは、暗黙の分母を持つ分数の分子として解釈される2の補数の符号付きまたは符号なしの整数です。
固定小数点命令は分子に適用されることを意図しており、分母を管理するのはソフトウェアの責任です。
Nビットの要素には、-2^N-1^...+2^N-1^-1の範囲の2の補数の符号付き整数と、0...+2^N-1^-1の範囲の符号なし整数を格納できます。
固定小数点命令は、スケーリングと丸めをサポートすることで狭いオペランドの精度を維持し、
結果を出力フォーマット範囲に飽和させることでオーバーフローを処理することができます。

..
  NOTE: The widening integer operations described above can also be used
  to avoid overflow.

.. note::

  
*************************************************
ベクトル単一幅飽和加算と飽和減算
*************************************************

..
  Saturating forms of integer add and subtract are provided, for both
  signed and unsigned integers.  If the result would overflow the
  destination, the result is replaced with the closest representable
  value, and the `vxsat` bit is set.

符号付き整数と符号なし整数の両方に対して、飽和形式の整数の加算と減算が提供されます。
結果が出力先をオーバーフローする場合、結果は最も近い表現可能な値で置き換えられ、 `vxsat` ビットが設定されます。

..
  ----
  # Saturating adds of unsigned integers.
  vsaddu.vv vd, vs2, vs1, vm   # Vector-vector
  vsaddu.vx vd, vs2, rs1, vm   # vector-scalar
  vsaddu.vi vd, vs2, imm, vm   # vector-immediate
  
  # Saturating adds of signed integers.
  vsadd.vv vd, vs2, vs1, vm   # Vector-vector
  vsadd.vx vd, vs2, rs1, vm   # vector-scalar
  vsadd.vi vd, vs2, imm, vm   # vector-immediate
  
  # Saturating subtract of unsigned integers.
  vssubu.vv vd, vs2, vs1, vm   # Vector-vector
  vssubu.vx vd, vs2, rs1, vm   # vector-scalar
  
  # Saturating subtract of signed integers.
  vssub.vv vd, vs2, vs1, vm   # Vector-vector
  vssub.vx vd, vs2, rs1, vm   # vector-scalar
  ----


::

  # 符号なし整数の飽和加算
  vsaddu.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vsaddu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vsaddu.vi vd, vs2, imm, vm   # ベクトル-即値
  
  # 符号付き整数の飽和加算
  vsadd.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vsadd.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  vsadd.vi vd, vs2, imm, vm   # ベクトル-即値
  
  # 符号付き整数の飽和減算
  vssubu.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vssubu.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  
  # 符号付き整数の飽和減算
  vssub.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
  vssub.vx vd, vs2, rs1, vm   # ベクトル-スカラ
  

*******************************************************
ベクトル単一幅平均加算と平均減算命令
*******************************************************

..
  The averaging add and subtract instructions right shift the result by
  one bit and round off the result according to the setting in `vxrm`.
  Both unsigned and signed versions are provided.
  For `vaaddu` and `vaadd` there can be no overflow in the result.
  For `vasub` and `vasubu`, overflow is ignored and the result wraps around.

平均化された加算・減算命令は、結果を1ビット右シフトし、 `vxrm` の設定に従って結果を丸める。
符号なしと符号ありのバージョンがあります。
`vaaddu` と `vaadd` では、結果にオーバーフローがあってはなりません。
`vasub` と `vasubu` では、オーバーフローは無視され、結果は折り返されます。

..
  NOTE: For `vasub`, overflow occurs only when subtracting the smallest number
  from the largest number under `rnu` or `rne` rounding.

.. note::

  
..
  ----
  # Averaging add
  
  # Averaging adds of unsigned integers.
  vaaddu.vv vd, vs2, vs1, vm   # roundoff*unsigned(vs2[i] + vs1[i], 1)
  vaaddu.vx vd, vs2, rs1, vm   # roundoff*unsigned(vs2[i] + x[rs1], 1)
  
  # Averaging adds of signed integers.
  vaadd.vv vd, vs2, vs1, vm   # roundoff*signed(vs2[i] + vs1[i], 1)
  vaadd.vx vd, vs2, rs1, vm   # roundoff*signed(vs2[i] + x[rs1], 1)
  
  # Averaging subtract
  
  # Averaging subtract of unsigned integers.
  vasubu.vv vd, vs2, vs1, vm   # roundoff*unsigned(vs2[i] - vs1[i], 1)
  vasubu.vx vd, vs2, rs1, vm   # roundoff*unsigned(vs2[i] - x[rs1], 1)
  
  # Averaging subtract of signed integers.
  vasub.vv vd, vs2, vs1, vm   # roundoff*signed(vs2[i] - vs1[i], 1)
  vasub.vx vd, vs2, rs1, vm   # roundoff*signed(vs2[i] - x[rs1], 1)
  ----


::

  # 平均加算
  
  # 符号なし整数の平均加算
  vaaddu.vv vd, vs2, vs1, vm   # roundoff*unsigned(vs2[i] + vs1[i], 1)
  vaaddu.vx vd, vs2, rs1, vm   # roundoff*unsigned(vs2[i] + x[rs1], 1)
  
  # 符号付き整数の整数加算
  vaadd.vv vd, vs2, vs1, vm   # roundoff*signed(vs2[i] + vs1[i], 1)
  vaadd.vx vd, vs2, rs1, vm   # roundoff*signed(vs2[i] + x[rs1], 1)
  
  # 平均減算
  
  # 符号なし整数の平均減算
  vasubu.vv vd, vs2, vs1, vm   # roundoff*unsigned(vs2[i] - vs1[i], 1)
  vasubu.vx vd, vs2, rs1, vm   # roundoff*unsigned(vs2[i] - x[rs1], 1)
  
  # 符号付き整数の平均減算
  vasub.vv vd, vs2, vs1, vm   # roundoff*signed(vs2[i] - vs1[i], 1)
  vasub.vx vd, vs2, rs1, vm   # roundoff*signed(vs2[i] - x[rs1], 1)
  

*************************************************************
丸めと飽和を用いたベクトル単一幅分数乗算
*************************************************************

..
  The signed fractional multiply instruction produces a 2*SEW product of
  the two SEW inputs, then shifts the result right by SEW-1 bits,
  rounding these bits according to `vxrm`, then saturates the result to
  fit into SEW bits.  If the result causes saturation, the `vxsat` bit
  is set.

符号付き分数乗算命令は、2つのSEW入力の2*SEW積を生成し、その結果をSEW-1ビットだけ右にシフトし、これらのビットを `vxrm` に従って丸めた後、SEWビットに収まるように結果を飽和させます。
結果が飽和した場合は、 `vxsat` ビットがセットされます。

..
  ----
  # Signed saturating and rounding fractional multiply
  # See vxrm  description for rounding calculation
  vsmul.vv vd, vs2, vs1, vm  # vd[i] = clip(roundoff*signed(vs2[i]*vs1[i], SEW-1))
  vsmul.vx vd, vs2, rs1, vm  # vd[i] = clip(roundoff*signed(vs2[i]*x[rs1], SEW-1))
  ----


::

  # 符号付き飽和丸め分数乗算
  # 丸め演算については、vxrmの説明を参照のこと
  vsmul.vv vd, vs2, vs1, vm  # vd[i] = clip(roundoff*signed(vs2[i]*vs1[i], SEW-1))
  vsmul.vx vd, vs2, rs1, vm  # vd[i] = clip(roundoff*signed(vs2[i]*x[rs1], SEW-1))
  

..
  NOTE: When multiplying two N-bit signed numbers, the largest magnitude
  is obtained for -2^N-1^ * -2^N-1^ producing a result +2^2N-2^, which
  has a single (zero) sign bit when held in 2N bits.  All other products
  have two sign bits in 2N bits.  To retain greater precision in N
  result bits, the product is shifted right by one bit less than N,
  saturating the largest magnitude result but increasing result
  precision by one bit for all other products.


.. note::

  結果は+2^2N-2^となり、2Nビットで保持する場合、符号ビットは1つ(ゼロ)となります。
  他のすべての製品は、2Nビットで2つの符号ビットを持ちます。
  N個の結果ビットでより高い精度を維持するために、製品はNよりも1ビット少ない数だけ右にシフトされ、
  最大の大きさの結果は飽和しますが、他のすべての製品では結果の精度が1ビット増加します。
  
..
  NOTE: We do not provide an equivalent fractional multiply where one
  input is unsigned, as these would retain all upper SEW bits and would
  not need to saturate.  This operation is partly covered by the
  `vmulhu` and `vmulhsu` instructions, for the case where rounding is
  simply truncation (`rdn`).

.. note::

  この操作は、丸めが単なる切り捨て(`rdn`)である場合には、 `vmulhu` および `vmulhsu` 命令によって部分的にカバーされます。
  

=======================================================
ベクトル単一幅スケーリングシフト命令
=======================================================
..
  These instructions shift the input value right, and round off the
  shifted out bits according to `vxrm`.  The scaling right shifts have
  both zero-extending (`vssrl`) and sign-extending (`vssra`) forms.
  The low lg2(SEW) bits of the vector or scalar shift-amount value are used;
  shift-amount immediates are zero-extended.

これらの命令は、入力値を右にシフトし、シフトしたビットを `vxrm` に従って丸めます。
スケーリングの右シフトには、ゼロ拡張型 (`vssrl`) と符号拡張型 (`vssra`) があります。
ベクトルまたはスカラのシフト量の値の下位lg2(SEW)ビットが使用され、シフト量の即値はゼロ拡張されます。

..
  ----
   # Scaling shift right logical
   vssrl.vv vd, vs2, vs1, vm   # vd[i] = roundoff*unsigned(vs2[i], vs1[i])
   vssrl.vx vd, vs2, rs1, vm   # vd[i] = roundoff*unsigned(vs2[i], x[rs1])
   vssrl.vi vd, vs2, uimm, vm  # vd[i] = roundoff*unsigned(vs2[i], uimm)
  
   # Scaling shift right arithmetic
   vssra.vv vd, vs2, vs1, vm   # vd[i] = roundoff*signed(vs2[i],vs1[i])
   vssra.vx vd, vs2, rs1, vm   # vd[i] = roundoff*signed(vs2[i], x[rs1])
   vssra.vi vd, vs2, uimm, vm  # vd[i] = roundoff*signed(vs2[i], uimm)
  ----

::

   # 論理スケーリング右シフト
   vssrl.vv vd, vs2, vs1, vm   # vd[i] = roundoff*unsigned(vs2[i], vs1[i])
   vssrl.vx vd, vs2, rs1, vm   # vd[i] = roundoff*unsigned(vs2[i], x[rs1])
   vssrl.vi vd, vs2, uimm, vm  # vd[i] = roundoff*unsigned(vs2[i], uimm)
  
   # 算術スケーリング右シフト
   vssra.vv vd, vs2, vs1, vm   # vd[i] = roundoff*signed(vs2[i],vs1[i])
   vssra.vx vd, vs2, rs1, vm   # vd[i] = roundoff*signed(vs2[i], x[rs1])
   vssra.vi vd, vs2, uimm, vm  # vd[i] = roundoff*signed(vs2[i], uimm)
  

*******************************************************
ベクトル固定小数点幅縮小クリップ命令
*******************************************************

..
  The `vnclip` instructions are used to pack a fixed-point value into a
  narrower destination.  The instructions support rounding, scaling, and
  saturation into the final destination format.

`vnclip` 命令は、固定小数点の値をより狭い出力先に詰めるために使用されます。
この命令は、最終的な出力形式への丸め、スケーリング、および飽和をサポートします。

..
  The second argument (vector element, scalar value, immediate value)
  gives the amount to right shift the source as in the narrowing shift
  instructions, which provides the scaling.  The low lg2(2*SEW) bits of
  the vector or scalar shift-amount value are used (e.g., the low 6 bits
  for a SEW=64-bit to SEW=32-bit narrowing operation).  The immediate
  forms zero-extend their shift-amount immediate operand.

2番目の引数(ベクトル要素、スカラ値、即値)は、狭義のシフト命令のようにソースを右シフトする量を与え、スケーリングを行います。
ベクトルまたはスカラのシフト量の値の下位lg2(2*SEW)ビットが使用されます(例：SEW=64ビットからSEW=32ビットへの狭帯域化操作の場合は下位6ビット)。
即値形式は、シフト量の即値オペランドをゼロ拡張します。

..
  ----
  # Narrowing unsigned clip
  #                                SEW                            2*SEW   SEW
   vnclipu.wv vd, vs2, vs1, vm  # vd[i] = clip(roundoff*unsigned(vs2[i], vs1[i]))
   vnclipu.wx vd, vs2, rs1, vm  # vd[i] = clip(roundoff*unsigned(vs2[i], x[rs1]))
   vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff*unsigned(vs2[i], uimm))
  
  # Narrowing signed clip
   vnclip.wv vd, vs2, vs1, vm   # vd[i] = clip(roundoff*signed(vs2[i], vs1[i]))
   vnclip.wx vd, vs2, rs1, vm   # vd[i] = clip(roundoff*signed(vs2[i], x[rs1]))
   vnclip.wi vd, vs2, uimm, vm  # vd[i] = clip(roundoff*signed(vs2[i], uimm))
  ----


::

  # 符号なし幅縮小クリップ命令
  #                                SEW                            2*SEW   SEW
   vnclipu.wv vd, vs2, vs1, vm  # vd[i] = clip(roundoff*unsigned(vs2[i], vs1[i]))
   vnclipu.wx vd, vs2, rs1, vm  # vd[i] = clip(roundoff*unsigned(vs2[i], x[rs1]))
   vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff*unsigned(vs2[i], uimm))
  
  # 符号付き幅縮小クリップ
   vnclip.wv vd, vs2, vs1, vm   # vd[i] = clip(roundoff*signed(vs2[i], vs1[i]))
   vnclip.wx vd, vs2, rs1, vm   # vd[i] = clip(roundoff*signed(vs2[i], x[rs1]))
   vnclip.wi vd, vs2, uimm, vm  # vd[i] = clip(roundoff*signed(vs2[i], uimm))
  

..
  For `vnclipu`/`vnclip`, the rounding mode is specified in the `vxrm`
  CSR.  Rounding occurs around the least-significant bit of the
  destination and before saturation.

`vnclipu`/`vnclip` では、丸め方は `vxrm` CSR で指定します。
丸めは、書き込みレジスタの最下位ビットを中心に、飽和演算の前に行われます。

..
  For `vnclipu`, the shifted rounded source value is treated as an
  unsigned integer and saturates if the result would overflow the
  destination viewed as an unsigned integer.

`vnclipu` では、シフトされた丸められたソース値は符号なし整数として扱われ、
その結果が符号なし整数として見た書き込みレジスタをオーバーフローする場合は飽和します。

..
  NOTE: There is no single instruction that can saturate a signed value
  into an unsigned destination.  A sequence of two vector instructions
  that first removes negative numbers by performing a max against 0
  using `vmax`, then clips the resulting unsigned value into the
  destination using `vnclipu`, can be used if setting `vxsat` value is
  not required.  A `vsetvli` is required inbetween these two
  instructions to change SEW.

.. note::

  `vxsat` の値を設定する必要がない場合は、まず `vmax` を使って0に対する最大値を実行して負の数を取り除き、
  次に `vnclipu` を使って結果の符号なしの値を出力先にクリップする、2つのベクトル命令列を使用できます。
  SEWを変更するには、この2つの命令の間に、 `vsetvli` が必要です。
  
..
  For `vnclip`, the shifted rounded source value is treated as a signed
  integer and saturates if the result would overflow the destination viewed
  as a signed integer.

`vnclip` では、丸められたシフト元の値を符号付き整数として扱い、その結果が符号付き整数として見たときに
書き込みレジスタをオーバーフローさせるようであれば飽和を行います。

..
  If any destination element is saturated, the `vxsat` bit is set in the
  `vxsat` register.

いずれかの書き込みレジスタの要素が飽和した場合、 `vxsat` レジスタに `vxsat` ビットが設定されます。

.. _sec-vector-float:

##################################
ベクトル浮動小数点命令
##################################

..
  The standard vector floating-point instructions treat 16-bit, 32-bit,
  64-bit, and 128-bit elements as IEEE-754/2008-compatible values.  If
  the EEW of a vector floating-point operand does not correspond to a
  supported IEEE floating-point type, the instruction encoding is
  reserved.

標準のベクトル浮動小数点命令では、16ビット、32ビット、64ビット、128ビットの要素をIEEE-754/2008互換の値として扱います。
ベクトル浮動小数点演算子のEEWが、サポートされているIEEE浮動小数点型に対応していない場合、
その命令のエンコーディングは予約されています。

..
  NOTE: The floating-point element widths that are supported depend on
  the profile.

.. note::

  
..
  Vector floating-point instructions require the presence of base scalar
  floating-point extensions corresponding to the supported vector
  floating-point element widths.

ベクトル浮動小数点命令では、サポートされているベクトル浮動小数点の要素幅に対応する
スカラ浮動小数点拡張が有効であることが必要です。

..
  NOTE: In particular, vector profiles supporting 16-bit half-precision
  floating-point values will also have to implement scalar
  half-precision floating-point support in the `f` registers.

.. note::

  `f` レジスタでのスカラ半精度浮動小数点サポートも実装する必要があります。
  
..
  If the floating-point unit status field `mstatus.FS` is `Off` then any
  attempt to execute a vector floating-point instruction will raise an
  illegal instruction exception.  Any vector floating-point instruction
  that modifies any floating-point extension state (i.e., floating-point
  CSRs or `f` registers) must set `mstatus.FS` to `Dirty`.

浮動小数点ユニット状態フィールド `mstatus.FS` が `Off` の場合、
ベクトル浮動小数点命令を実行しようとすると、不正な命令例外が発生します。
浮動小数点拡張状態(浮動小数点CSRや `f` レジスタなど)を変更するベクトル浮動小数点命令は、
`mstatus.FS` を `Dirty` に設定しなければなりません。

..
  The vector floating-point instructions have the same behavior as the
  scalar floating-point instructions with regard to NaNs.

ベクトル浮動小数点命令は、NaNに関してはスカラ浮動小数点命令と同じ動作をします。

..
  Scalar values for vector-scalar operations can be sourced from the
  standard scalar `f` registers, as described in Section
  :ref:`sec-arithmetic-encoding` .

ベクトルスカラ演算のスカラ値は、 :ref:`sec-arithmetic-encoding`  で説明したように、
標準的なスカラ `f` レジスタから供給することができます。


*******************************************
ベクトル浮動小数点例外フラグ
*******************************************

..
  A vector floating-point exception at any active floating-point element
  sets the standard FP exception flags in the `fflags` register.  Inactive
  elements do not set FP exception flags.

アクティブな浮動小数点要素でのベクトル浮動小数点例外は、 `fflags` レジスタの標準FP例外フラグを設定します。
非アクティブな要素はFP例外フラグを設定しません。


****************************************************
ベクトル単一幅浮動小数点加減算命令
****************************************************
..
  ----
      # Floating-point add
      vfadd.vv vd, vs2, vs1, vm   # Vector-vector
      vfadd.vf vd, vs2, rs1, vm   # vector-scalar
  
      # Floating-point subtract
      vfsub.vv vd, vs2, vs1, vm   # Vector-vector
      vfsub.vf vd, vs2, rs1, vm   # Vector-scalar vd[i] = vs2[i] - f[rs1]
      vfrsub.vf vd, vs2, rs1, vm  # Scalar-vector vd[i] = f[rs1] - vs2[i]
  ----


::

      # 浮動小数点加算
      vfadd.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vfadd.vf vd, vs2, rs1, vm   # ベクトル-スカラ
  
      # 浮動小数点減算
      vfsub.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vfsub.vf vd, vs2, rs1, vm   # ベクトル-スカラ vd[i] = vs2[i] - f[rs1]
      vfrsub.vf vd, vs2, rs1, vm  # スカラ-ベクトル vd[i] = f[rs1] - vs2[i]
  

****************************************************
ベクトル幅拡張浮動小数点加減算命令
****************************************************

..
  ----
  # Widening FP add/subtract, 2*SEW = SEW +/- SEW
  vfwadd.vv vd, vs2, vs1, vm  # vector-vector
  vfwadd.vf vd, vs2, rs1, vm  # vector-scalar
  vfwsub.vv vd, vs2, vs1, vm  # vector-vector
  vfwsub.vf vd, vs2, rs1, vm  # vector-scalar
  
  # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW
  vfwadd.wv  vd, vs2, vs1, vm  # vector-vector
  vfwadd.wf  vd, vs2, rs1, vm  # vector-scalar
  vfwsub.wv  vd, vs2, vs1, vm  # vector-vector
  vfwsub.wf  vd, vs2, rs1, vm  # vector-scalar
  ----


::

  # 浮動小数点幅拡張加減算命令, 2*SEW = SEW +/- SEW
  vfwadd.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
  vfwadd.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  vfwsub.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
  vfwsub.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  
  # 浮動小数点幅拡張加減算命令, 2*SEW = 2*SEW +/- SEW
  vfwadd.wv  vd, vs2, vs1, vm  # ベクトル-ベクトル
  vfwadd.wf  vd, vs2, rs1, vm  # ベクトル-スカラ
  vfwsub.wv  vd, vs2, vs1, vm  # ベクトル-ベクトル
  vfwsub.wf  vd, vs2, rs1, vm  # ベクトル-スカラ
  

*******************************************************
ベクトル単一幅浮動小数点乗算除算命令
*******************************************************

..
  ----
      # Floating-point multiply
      vfmul.vv vd, vs2, vs1, vm   # Vector-vector
      vfmul.vf vd, vs2, rs1, vm   # vector-scalar
  
      # Floating-point divide
      vfdiv.vv vd, vs2, vs1, vm   # Vector-vector
      vfdiv.vf vd, vs2, rs1, vm   # vector-scalar
  
      # Reverse floating-point divide vector = scalar / vector
      vfrdiv.vf vd, vs2, rs1, vm  # scalar-vector, vd[i] = f[rs1]/vs2[i]
  ----


::

      # 浮動小数点乗算
      vfmul.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vfmul.vf vd, vs2, rs1, vm   # ベクトル-スカラ
  
      # 浮動小数点除算
      vfdiv.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vfdiv.vf vd, vs2, rs1, vm   # ベクトル-スカラ
  
      # 浮動小数点逆除算 ベクトル = スカラ / ベクトル
      vfrdiv.vf vd, vs2, rs1, vm  # scalar-vector, vd[i] = f[rs1]/vs2[i]
  

*************************************************
ベクトル幅拡張浮動小数点乗算命令
*************************************************

..
  ----
  # Widening floating-point multiply
  vfwmul.vv    vd, vs2, vs1, vm # vector-vector
  vfwmul.vf    vd, vs2, rs1, vm # vector-scalar
  ----

::

  # 幅拡張浮動小数点乗算
  vfwmul.vv    vd, vs2, vs1, vm # ベクトル-ベクトル
  vfwmul.vf    vd, vs2, rs1, vm # ベクトル-スカラ
  

*************************************************************
ベクトル単一幅浮動小数点複合乗算加算命令
*************************************************************

..
  All four varieties of fused multiply-add are provided, and in two
  destructive forms that overwrite one of the operands, either the
  addend or the first multiplicand.

複合乗算加算の4種類すべてが提供されており、加算値または最初の乗算値のいずれかのオペランドを上書きする2つの破壊的な形式があります。

::

  # FP multiply-accumulate, overwrites addend
  vfmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vfmacc.vf vd, rs1, vs2, vm    # vd[i] = +(f[rs1] * vs2[i]) + vd[i]
  
  # FP negate-(multiply-accumulate), overwrites subtrahend
  vfnmacc.vv vd, vs1, vs2, vm   # vd[i] = -(vs1[i] * vs2[i]) - vd[i]
  vfnmacc.vf vd, rs1, vs2, vm   # vd[i] = -(f[rs1] * vs2[i]) - vd[i]
  
  # FP multiply-subtract-accumulator, overwrites subtrahend
  vfmsac.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) - vd[i]
  vfmsac.vf vd, rs1, vs2, vm    # vd[i] = +(f[rs1] * vs2[i]) - vd[i]
  
  # FP negate-(multiply-subtract-accumulator), overwrites minuend
  vfnmsac.vv vd, vs1, vs2, vm   # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
  vfnmsac.vf vd, rs1, vs2, vm   # vd[i] = -(f[rs1] * vs2[i]) + vd[i]
  
  # FP multiply-add, overwrites multiplicand
  vfmadd.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vd[i]) + vs2[i]
  vfmadd.vf vd, rs1, vs2, vm    # vd[i] = +(f[rs1] * vd[i]) + vs2[i]
  
  # FP negate-(multiply-add), overwrites multiplicand
  vfnmadd.vv vd, vs1, vs2, vm   # vd[i] = -(vs1[i] * vd[i]) - vs2[i]
  vfnmadd.vf vd, rs1, vs2, vm   # vd[i] = -(f[rs1] * vd[i]) - vs2[i]
  
  # FP multiply-sub, overwrites multiplicand
  vfmsub.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vd[i]) - vs2[i]
  vfmsub.vf vd, rs1, vs2, vm    # vd[i] = +(f[rs1] * vd[i]) - vs2[i]
  
  # FP negate-(multiply-sub), overwrites multiplicand
  vfnmsub.vv vd, vs1, vs2, vm   # vd[i] = -(vs1[i] * vd[i]) + vs2[i]
  vfnmsub.vf vd, rs1, vs2, vm   # vd[i] = -(f[rs1] * vd[i]) + vs2[i]
  

..
  NOTE: It would be possible to use the two unused rounding modes in the
  scalar FP FMA encoding to provide a few non-destructive FMAs.
  However, this would be the only maskable operation with three inputs
  and separate output.

.. note::

  いくつかの非破壊的なFMAを提供することは可能でしょう。
  しかし、これは3つの入力と別々の出力を持つ唯一のマスク可能な演算となります。
  

*************************************************************
ベクトル幅拡張浮動小数点複合乗算加算命令
*************************************************************

..
  The widening floating-point fused multiply-add instructions all
  overwrite the wide addend with the result.  The multiplier inputs are
  all SEW wide, while the addend and destination is 2*SEW bits wide.


幅拡張された浮動小数点複合乗算加算命令は、すべて幅拡張された加算先を結果で上書きします。
乗算器の入力はすべてSEW幅で、加算器と出力は2*SEWビット幅です。

::

  # FP widening multiply-accumulate, overwrites addend
  vfwmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
  vfwmacc.vf vd, rs1, vs2, vm    # vd[i] = +(f[rs1] * vs2[i]) + vd[i]
  
  # FP widening negate-(multiply-accumulate), overwrites addend
  vfwnmacc.vv vd, vs1, vs2, vm   # vd[i] = -(vs1[i] * vs2[i]) - vd[i]
  vfwnmacc.vf vd, rs1, vs2, vm   # vd[i] = -(f[rs1] * vs2[i]) - vd[i]
  
  # FP widening multiply-subtract-accumulator, overwrites addend
  vfwmsac.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) - vd[i]
  vfwmsac.vf vd, rs1, vs2, vm    # vd[i] = +(f[rs1] * vs2[i]) - vd[i]
  
  # FP widening negate-(multiply-subtract-accumulator), overwrites addend
  vfwnmsac.vv vd, vs1, vs2, vm   # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
  vfwnmsac.vf vd, rs1, vs2, vm   # vd[i] = -(f[rs1] * vs2[i]) + vd[i]
  

*******************************************
ベクトル浮動小数点平方根命令
*******************************************

..
  This is a unary vector-vector instruction.

これは単精度ベクトル-ベクトル命令です。

..
  ----
      # Floating-point square root
      vfsqrt.v vd, vs2, vm   # Vector-vector square root
  ----

::

      # 浮動小数点平方根命令
      vfsqrt.v vd, vs2, vm   # ベクトル-ベクトル 平方根
  

****************************************************
ベクトル浮動小数点逆平方根推定命令
****************************************************

..
  ----
      # Floating-point reciprocal square-root estimate to 7 bits.
      vfrsqrt7.v vd, vs2, vm
  ----

::

      # 7ビットの浮動小数点逆平方根推定命令
      vfrsqrt7.v vd, vs2, vm
  

..
  This is a unary vector-vector instruction that returns an estimate of
  1/sqrt(x) accurate to 7 bits.

これは、7ビットの精度で1/sqrt(x)の推定値を返す、単項のベクトル-ベクトル命令です。

..
  NOTE: An earlier draft version had used the assembler name `vfrsqrte7`
  but this was deemed to cause confusion with the ``e``**x** notation for element
  width.  The earlier name can be retained as alias in tool chains for
  backward compatibility.

.. note::

  後方互換性のために、ツールチェインのエイリアスとして以前の名前を残すことができます。
  
..
  The following table describes the instruction's behavior for all
  classes of floating-point inputs:

次の表は、すべてのクラスの浮動小数点入力に対するこの命令の動作を示しています。

..


..
  NOTE: All positive normal and subnormal inputs produce normal outputs.

.. note::

  
..
  NOTE: The output value is independent of the dynamic rounding mode.

.. note::

  
..
  For the non-exceptional cases, the low bit of the exponent and the six high
  bits of significand (after the leading one) are concatenated and used to
  address the following table.
  The output of the table becomes the seven high bits of the result significand
  (after the leading one); the remainder of the result significand is zero.
  Subnormal inputs are normalized and the exponent adjusted appropriately before
  the lookup.
  The output exponent is chosen to make the result approximate the reciprocal of
  the square root of the argument.

例外的なケースでは、指数の下位ビットと合数の上位6ビット(先頭の1ビットの後)が連結され、以下のテーブルのアドレスに使用されます。
このテーブルの出力は、結果として得られる信号の上位7ビット(先頭の1ビットの後)となり、結果として得られる信号の残りの部分はゼロとなります。
正常でない入力は、ルックアップの前に正規化され、指数が適切に調整されます。
出力の指数は、結果が引数の平方根の逆数に近似するように選択されます。

..
  More precisely, the result is computed as follows.
  Let the normalized input exponent be equal to the input exponent if the input
  is normal, or 0 minus the number of leading zeros in the significand
  otherwise.
  If the input is subnormal, the normalized input significand is given by
  shifting the input significand left by 1 minus the normalized input exponent,
  discarding the leading 1 bit.
  The output exponent equals floor((3*B - 1 - the normalized input exponent) / 2).
  The output sign equals the input sign.

より正確には、結果は以下のように計算されます。
正規化された入力指数を、入力が正常な場合は入力指数と等しく、そうでない場合は0から符号の先頭のゼロの数を引いた値とします。
入力が非正規の場合、正規化された入力指数は、先頭の1ビットを捨てて、入力指数を1から正規化された入力指数を引いて左にシフトすることで与えられます。
出力指数は floor((3*B - 1 - 正規化された入力指数) / 2) に等しくなります。
出力符号は入力符号に等しくなります。

..
  The following table gives the seven MSBs of the output significand as a
  function of the LSB of the normalized input exponent and the six MSBs of the
  normalized input significand; the other bits of the output significand are zero.

次の表は、正規化された入力指数のLSBと正規化された入力記号の6つのMSBの関数として、出力記号の7つのMSBを示したもので、出力記号の他のビットはゼロです。

include::vfrsqrt7.adoc[]

..
  NOTE: For example, when SEW=32, vfrsqrt7(0x00718abc ({approx} 1.043e-38))
  = 0x5f080000 ({approx} 9.800e18), and vfrsqrt7(0x7f765432 ({approx} 3.274e38))
  = 0x1f820000 ({approx} 5.506e-20).

.. note::

  および vfrsqrt7(0x7f765432 ({approx} 3.274e38)) = 0x1f820000 ({approx} 5.506e-20)となります。
  
..
  NOTE: The 7 bit accuracy was chosen as it requires 0,1,2,3
  Newton-Raphson iterations to converge to close to bfloat16, FP16,
  FP32, FP64 accuracy respectively.   Future instructions can be defined
  with greater estimate accuracy.

.. note::

  将来的には、より高い推定精度の命令を定義することができます。
  

**********************************************
ベクトル浮動小数点逆数推定命令
**********************************************

..
  ----
      # Floating-point reciprocal estimate to 7 bits.
      vfrec7.v vd, vs2, vm
  ----

::

      # 7ビットの浮動小数点逆数推定命令
      vfrec7.v vd, vs2, vm
  

..
  NOTE: An earlier draft version had used the assembler name `vfrece7`
  but this was deemed to cause confusion with ``e``**x** notation for element
  width.  The earlier name can be retained as alias in tool chains for
  backward compatibility.

.. note::

  要素の幅を表す ``e``**x** 記法との混同を招くと判断されました。
  以前の名前は後方互換性のためにツールチェインのエイリアスとして保持することができます。
  
..
  This is a unary vector-vector instruction that returns an estimate of
  1/x accurate to 7 bits.

これは単項のベクトル・ベクトル命令で、7ビットの精度で1/xの推定値を返します。

..
  The following table describes the instruction's behavior for all
  classes of floating-point inputs, where *B* is the exponent bias:

次の表は、すべてのクラスの浮動小数点入力に対するこの命令の動作を示しています(*B*は指数バイアスです)。

..


..
  NOTE: Subnormal inputs with magnitude at least 2^-(B+1)^ produce normal outputs;
  other subnormal inputs produce infinite outputs.
  Normal inputs with magnitude at least 2^B-1^ produce subnormal outputs;
  other normal inputs produce normal outputs.

.. note::

  少なくとも2^B-1^の大きさを持つ通常の入力は、サブノーマルの出力を生成し、他の通常の入力は通常の出力を生成します。
  
..
  NOTE: The output value depends on the dynamic rounding mode when
  the overflow exception is raised.

.. note::

  
..
  For the non-exceptional cases, the seven high bits of significand (after the
  leading one) are used to address the following table.
  The output of the table becomes the seven high bits of the result significand
  (after the leading one); the remainder of the result significand is zero.
  Subnormal inputs are normalized and the exponent adjusted appropriately before
  the lookup.
  The output exponent is chosen to make the result approximate the reciprocal of
  the argument, and subnormal outputs are denormalized accordingly.

例外が発生しないケースでは、上位7ビットの仮数部(先頭の1ビットの後)が次の表のアドレスに使用されます。
このテーブルの出力は、結果として得られる信号の上位7ビット(先頭の1ビットの後)となり、結果として得られる信号の残りの部分はゼロとなります。
正常でない入力は、ルックアップの前に正規化され、指数が適切に調整されます。
出力の指数は、結果が引数の逆数に近似するように選択され、サブノーマル出力はそれに応じて非正規化されます。

..
  More precisely, the result is computed as follows.
  Let the normalized input exponent be equal to the input exponent if the input
  is normal, or 0 minus the number of leading zeros in the significand
  otherwise.
  The normalized output exponent equals (2*B - 1 - the normalized input exponent).
  If the normalized output exponent is outside the range [-1, 2*B], the result
  corresponds to one of the exceptional cases in the table above.

より正確には、結果は以下のように計算されます。
正規化された入力指数を、入力が正常な場合は入力指数と等しく、そうでない場合は0から符号の先頭のゼロの数を引いたものとします。
正規化された出力指数は、(2*B - 1 - 正規化された入力指数)に等しくなります。
正規化された出力指数が[-1, 2*B]の範囲外である場合、その結果は上の表の例外的なケースの1つに対応します。

..
  If the input is subnormal, the normalized input significand is given by
  shifting the input significand left by 1 minus the normalized input exponent,
  discarding the leading 1 bit.
  Otherwise, the normalized input significand equals the input significand.
  The following table gives the seven MSBs of the normalized output significand
  as a function of the seven MSBs of the normalized input significand; the other
  bits of the normalized output significand are zero.

入力が非正規の場合、正規化された入力多項式は、入力多項式を1から正規化された入力指数を引いて左にシフトし、先頭の1ビットを捨てて与えられます。
それ以外の場合は、正規化された入力信号は入力信号と同じです。
次の表は、正規化された入力信号の7つのMSBの関数として、正規化された出力信号の7つのMSBを示したもので、正規化された出力信号の他のビットはゼロです。

include::vfrec7.adoc[]

..
  If the normalized output exponent is 0 or -1, the result is subnormal: the
  output exponent is 0, and the output significand is given by concatenating
  a 1 bit to the left of the normalized output significand, then shifting that
  quantity right by 1 minus the normalized output exponent.
  Otherwise, the output exponent equals the normalized output exponent, and the
  output significand equals the normalized output significand.
  The output sign equals the input sign.

正規化された出力指数が0または-1の場合、結果はサブノーマル数となります。
出力指数は0で、出力の仮数部は正規化された出力仮数部の左にある1ビットを連結し、
その量を正規化された出力指数から1を引いて右にシフトしたものになります。
それ以外の場合は、出力指数は正規化された出力指数に等しく、出力合言葉は正規化された出力仮数部に等しくなります。
出力符号は入力符号に等しくなります。

..
  NOTE: For example, when SEW=32, vfrec7(0x00718abc ({approx} 1.043e-38))
  = 0x7e900000 ({approx} 9.570e37), and vfrec7(0x7f765432 ({approx} 3.274e38))
  = 0x00214000 ({approx} 3.053e-39).

.. note::

  およびvfrec7(0x7f765432 ({approx} 3.274e38)) = 0x00214000 ({approx} 3.053e-39)となります。
  
..
  NOTE: The 7 bit accuracy was chosen as it requires 0,1,2,3
  Newton-Raphson iterations to converge to close to bfloat16, FP16,
  FP32, FP64 accuracy respectively.   Future instructions can be defined
  with greater estimate accuracy.

.. note::

  将来的には、より高い推定精度の命令を定義することができます。
  

*****************************************
ベクトル浮動小数点MIN/MAX命令
*****************************************

..
  The vector floating-point `vfmin` and `vfmax` instructions have the
  same behavior as the corresponding scalar floating-point instructions
  in version 2.2 of the RISC-V F/D/Q extension.

ベクトル浮動小数点 `vfmin` および `vfmax` 命令は、RISC-V F/D/Q 拡張のバージョン 2.2 における対応するスカラ浮動小数点命令と同じ動作をします。


..
  ----
      # Floating-point minimum
      vfmin.vv vd, vs2, vs1, vm   # Vector-vector
      vfmin.vf vd, vs2, rs1, vm   # vector-scalar
  
      # Floating-point maximum
      vfmax.vv vd, vs2, vs1, vm   # Vector-vector
      vfmax.vf vd, vs2, rs1, vm   # vector-scalar
  ----


::

      # 浮動小数点最小値
      vfmin.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vfmin.vf vd, vs2, rs1, vm   # ベクトル-スカラ
  
      # 浮動小数点最大値
      vfmax.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vfmax.vf vd, vs2, rs1, vm   # ベクトル-スカラ
  

**********************************************
ベクトル浮動小数点符号挿入命令
**********************************************

..
  Vector versions of the scalar sign-injection instructions.  The result
  takes all bits except the sign bit from the vector `vs2` operands.


スカラの符号挿入命令のベクトル版です。
結果は符号ビットを除くすべてのビットを、ベクトルの `vs2`  オペランドから取得します。

..
  ----
      vfsgnj.vv vd, vs2, vs1, vm   # Vector-vector
      vfsgnj.vf vd, vs2, rs1, vm   # vector-scalar
  
      vfsgnjn.vv vd, vs2, vs1, vm  # Vector-vector
      vfsgnjn.vf vd, vs2, rs1, vm  # vector-scalar
  
      vfsgnjx.vv vd, vs2, vs1, vm  # Vector-vector
      vfsgnjx.vf vd, vs2, rs1, vm  # vector-scalar
  ----

::

      vfsgnj.vv vd, vs2, vs1, vm   # ベクトル-ベクトル
      vfsgnj.vf vd, vs2, rs1, vm   # ベクトル-スカラ
  
      vfsgnjn.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
      vfsgnjn.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  
      vfsgnjx.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
      vfsgnjx.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  

..
  NOTE: A vector of floating-point values can be negated using a
  sign-injection instruction with both source operands set to the same
  vector operand.  Can define assembly pseudoinstruction `vfneg.v vd,vs`
  = `vfsgnjn.vv vd,vs,vs`.


.. note::

  アセンブリ疑似命令 `vfneg.vd,vs` = `vfsgnjn.vv vd,vs,vs` を定義することができます。
  
..
  NOTE: The absolute value of a vector of floating-point elements can be
  calculated using a sign-injection instruction with both source
  operands set to the same vector operand.  Can define assembly
  pseudoinstruction `vfabs.v vd,vs` = `vfsgnjx.vv vd,vs,vs`.


.. note::

  アセンブリ疑似命令 `vfabs.vd,vs` = `vfsgnjx.vv vd,vs,vs` を定義できます。
  

****************************************
ベクトル浮動小数点比較命令
****************************************

..
  These vector FP compare instructions compare two source operands and
  write the comparison result to a mask register.  The destination mask
  vector is always held in a single vector register, with a layout of
  elements as described in Section :ref:`sec-mask-register-layout` .  The
  destination mask vector register may be the same as the source vector
  mask register (`v0`).  Comparisons write mask registers, and so always
  operate under a tail-agnostic policy.


これらのベクトルFP比較命令は、2つのソースオペランドを比較し、比較結果をマスクレジスタに書き込みます。
書き込みマスクベクトルは常に単一のベクトルレジスタに保持され、その要素のレイアウトは :ref:`sec-mask-register-layout`  節で説明されています。
書き込みマスクレジスタはソースベクトルマスクレジスタ(`v0`)と同じであってもかまいません。
比較はマスクレジスタを書き込むため、常にTail-agnosticポリシで動作します。

..
  The compare instructions follow the semantics of the scalar
  floating-point compare instructions.  `vmfeq` and `vmfne` raise the invalid
  operation exception only on signaling NaN inputs.  `vmflt`, `vmfle`, `vmfgt`,
  and `vmfge` raise the invalid operation exception on both signaling and
  quiet NaN inputs.
  `vmfne` writes 1 to the destination element when either
  operand is NaN, whereas the other comparisons write 0 when either operand
  is NaN.

比較命令は、スカラ浮動小数点比較命令の文法に従います。
`vmfeq` と `vmfne` は NaN 入力の信号に対してのみ無効な操作という例外を発生させます。
`vmflt`、 `vmfle`、 `vmfgt`、 `vmfge` は、シグナリング NaN 入力とサイレント NaN 入力の両方で無効な操作に関する例外を発生させます。
`vmfne` は、どちらかのオペランドが NaN のときに出力要素に 1 を書き込みますが、
他の比較ではどちらかのオペランドが NaN のときに 0 を書き込みます。

..
  ----
      # Compare equal
      vmfeq.vv vd, vs2, vs1, vm  # Vector-vector
      vmfeq.vf vd, vs2, rs1, vm  # vector-scalar
  
      # Compare not equal
      vmfne.vv vd, vs2, vs1, vm  # Vector-vector
      vmfne.vf vd, vs2, rs1, vm  # vector-scalar
  
      # Compare less than
      vmflt.vv vd, vs2, vs1, vm  # Vector-vector
      vmflt.vf vd, vs2, rs1, vm  # vector-scalar
  
      # Compare less than or equal
      vmfle.vv vd, vs2, vs1, vm  # Vector-vector
      vmfle.vf vd, vs2, rs1, vm  # vector-scalar
  
      # Compare greater than
      vmfgt.vf vd, vs2, rs1, vm  # vector-scalar
  
      # Compare greater than or equal
      vmfge.vf vd, vs2, rs1, vm  # vector-scalar
  ----


::

      # 等価比較
      vmfeq.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
      vmfeq.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  
      # 非等価比較
      vmfne.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
      vmfne.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  
      # 小なり比較
      vmflt.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
      vmflt.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  
      # 以下比較
      vmfle.vv vd, vs2, vs1, vm  # ベクトル-ベクトル
      vmfle.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  
      # 大なり比較
      vmfgt.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  
      # 以上比較
      vmfge.vf vd, vs2, rs1, vm  # ベクトル-スカラ
  

..
  ----
  Comparison      Assembler Mapping             Assembler pseudoinstruction
  
  va < vb         vmflt.vv vd, va, vb, vm
  va <= vb        vmfle.vv vd, va, vb, vm
  va > vb         vmflt.vv vd, vb, va, vm    vmfgt.vv vd, va, vb, vm
  va >= vb        vmfle.vv vd, vb, va, vm    vmfge.vv vd, va, vb, vm
  
  va < f          vmflt.vf vd, va, f, vm
  va <= f         vmfle.vf vd, va, f, vm
  va > f          vmfgt.vf vd, va, f, vm
  va >= f         vmfge.vf vd, va, f, vm
  
  va, vb vector register groups
  f      scalar floating-point register
  ----


::

  比較            アセンブリ言語マッピング   アセンブリ言語疑似命令
  
  va < vb         vmflt.vv vd, va, vb, vm
  va <= vb        vmfle.vv vd, va, vb, vm
  va > vb         vmflt.vv vd, vb, va, vm    vmfgt.vv vd, va, vb, vm
  va >= vb        vmfle.vv vd, vb, va, vm    vmfge.vv vd, va, vb, vm
  
  va < f          vmflt.vf vd, va, f, vm
  va <= f         vmfle.vf vd, va, f, vm
  va > f          vmfgt.vf vd, va, f, vm
  va >= f         vmfge.vf vd, va, f, vm
  
  va, vb ベクトルレジスタグループ
  f      スカラ浮動小数点レジスタ
  

..
  NOTE: Providing all forms is necessary to correctly handle unordered
  comparisons for NaNs.


.. note::

  
..
  NOTE: C99 floating-point quiet comparisons can be implemented by masking
  the signaling comparisons when either input is NaN, as follows.  When
  the comparand is a non-NaN constant, the middle two instructions can be
  omitted.


.. note::

  比較対象が非 NaN 定数の場合は、中間の2つの命令を省略できます。
  
..
  ----
      # Example of implementing isgreater()
      vmfeq.vv v0, va, va        # Only set where A is not NaN.
      vmfeq.vv v1, vb, vb        # Only set where B is not NaN.
      vmand.mm v0, v0, v1        # Only set where A and B are ordered,
      vmfgt.vv v0, va, vb, v0.t  #  so only set flags on ordered values.
  ----


::

      # isgreater() の実装例
      vmfeq.vv v0, va, va        # AがNaNでない場合にのみセットされる
      vmfeq.vv v1, vb, vb        # BがNaNでない場合にのみセットされる
      vmand.mm v0, v0, v1        # AとBに順序関係があるときのみセットされる
      vmfgt.vv v0, va, vb, v0.t  #  従って、順序関係がある値の場合にのみフラグが設定される
  

..
  NOTE: In the above sequence, it is tempting to mask the second `vmfeq`
  instruction and remove the `vmand` instruction, but this more efficient
  sequence incorrectly fails to raise the invalid exception when an
  element of `va` contains a quiet NaN and the corresponding element in
  `vb` contains a signaling NaN.

.. note::

  より効率的なシーケンスでは、`va` の要素に quiet NaN が含まれ、`vb` の対応する要素に signaling NaN が含まれている場合に、
  不正な例外を発生させることができません。
  

****************************************
ベクトル浮動小数点分類命令
****************************************

..
  This is a unary vector-vector instruction that operates in the same
  way as the scalar classify instruction.

スカラ分類命令と同様に動作する単項のベクトル-ベクトル命令です。

..
  ----
      vfclass.v vd, vs2, vm   # Vector-vector
  ----


::

      vfclass.v vd, vs2, vm   # ベクトル-ベクトル
  

..
  The 10-bit mask produced by this instruction is placed in the
  least-significant bits of the result elements.  The upper (SEW-10)
  bits of the result are filled with zeros. The instruction is only
  defined for SEW=16b and above, so the result will always fit in the
  destination elements.

この命令で生成された10ビットのマスクは、結果要素の最下位ビットに配置されます。
結果の上位(SEW-10)ビットには0が入ります。
この命令はSEW=16b以上でのみ定義されているため、結果は常に書き込み要素に収まります。


*******************************************
ベクトル浮動小数点マージ命令
*******************************************

..
  A vector-scalar floating-point merge instruction is provided, which
  operates on all body elements, from `vstart` up to the current vector
  length in `vl` regardless of mask value.

ベクトルスカラ浮動小数点マージ命令が提供されています。
この命令は、マスク値に関係なく、`vstart` から `vl`  の現在のベクトル長までのすべてのボディ要素で動作します。

..
  The `vfmerge.vfm` instruction is encoded as a masked instruction (`vm=0`).
  At elements where the mask value is zero, the first vector operand is
  copied to the destination element, otherwise a scalar floating-point
  register value is copied to the destination element.

`vfmerge.vfm` 命令は、マスクされた命令 (`vm=0`) としてエンコードされます。
マスク値がゼロの要素では、最初のベクトルオペランドが書き込み要素にコピーされ、そうでない場合は、スカラ浮動小数点レジスタ値が書き込み要素にコピーされます。

::

  vfmerge.vfm vd, vs2, rs1, v0  # vd[i] = v0.mask[i] ? f[rs1] : vs2[i]
  

****************************************
ベクトル浮動小数点移動命令
****************************************

..
  The vector floating-point move instruction **splats** a floating-point
  scalar operand to a vector register group.  The instruction copies a
  scalar `f` register value to all active elements of a vector register
  group.  This instruction is encoded as a masked instruction (`vm=1`).
  The instruction must have the `vs2` field set to `v0`, with all other
  values for `vs2` reserved.

ベクトル浮動小数点移動命令は、浮動小数点のスカラオペランドをベクトルレジスタグループに **転送** します。
この命令は、スカラ `f` レジスタ値をベクトルレジスタグループのすべてのアクティブな要素にコピーします。
この命令はマスクされた命令(`vm=1`)としてエンコードされます。
この命令は、`vs2` フィールドが `v0` に設定されていなければならず、`vs2` の他の値はすべて予約されています。

::

  vfmv.v.f vd, rs1  # vd[i] = f[rs1]
  

..
  NOTE: The `vfmv.v.f` instruction shares the encoding with the `vfmerge.vfm`
  instruction, but with `vm=1` and `vs2=v0`.

.. note::

  
***********************************************
単一幅浮動所数点/整数型変換命令
***********************************************

..
  Conversion operations are provided to convert to and from
  floating-point values and unsigned and signed integers, where both
  source and destination are SEW wide.

浮動小数点値、符号なし整数、符号あり整数との間の変換操作が用意されており、変換元と変換先の両方がSEW幅である。

..
  ----
  vfcvt.xu.f.v vd, vs2, vm       # Convert float to unsigned integer.
  vfcvt.x.f.v  vd, vs2, vm       # Convert float to signed integer.
  
  vfcvt.rtz.xu.f.v vd, vs2, vm   # Convert float to unsigned integer, truncating.
  vfcvt.rtz.x.f.v  vd, vs2, vm   # Convert float to signed integer, truncating.
  
  vfcvt.f.xu.v vd, vs2, vm       # Convert unsigned integer to float.
  vfcvt.f.x.v  vd, vs2, vm       # Convert signed integer to float.
  ----

::

  vfcvt.xu.f.v vd, vs2, vm       # 浮動小数点から符号なし整数への変換.
  vfcvt.x.f.v  vd, vs2, vm       # 浮動小数点から符号付き整数への変換.
  
  vfcvt.rtz.xu.f.v vd, vs2, vm   # 浮動小数点からtruncateを使用した符号なし整数への変換.
  vfcvt.rtz.x.f.v  vd, vs2, vm   # 浮動小数点からtruncateを使用した符号付き整数への変換.
  
  vfcvt.f.xu.v vd, vs2, vm       # 符号なし整数から浮動小数点への変換.
  vfcvt.f.x.v  vd, vs2, vm       # 符号付き整数から浮動小数点への変換.
  

..
  The conversions follow the same rules on exceptional conditions as the
  scalar conversion instructions.
  The conversions use the dynamic rounding mode in `frm`, except for the `rtz`
  variants, which round towards zero.

変換命令は、スカラ変換命令と同じ例外的な条件に関するルールに従います。
これらの変換は、ゼロに向かって丸める `rtz`  の変種を除いて、`frm` の動的丸めモードを使用します。

..
  NOTE: The `rtz` variants are provided to accelerate truncating conversions
  from floating-point to integer, as is common in languages like C and Java.

.. note::

  浮動小数点から整数への切り捨て変換を高速化するために提供されています。
  

***********************************************
幅拡張浮動小数点/整数型変換命令
***********************************************

..
  A set of conversion instructions is provided to convert between
  narrower integer and floating-point datatypes to a type of twice the
  width.

より小さな整数型や浮動小数点型のデータタイプを2倍の幅の型に変換する変換命令群が用意されています。


..
  ----
  vfwcvt.xu.f.v vd, vs2, vm       # Convert float to double-width unsigned integer.
  vfwcvt.x.f.v  vd, vs2, vm       # Convert float to double-width signed integer.
  
  vfwcvt.rtz.xu.f.v vd, vs2, vm   # Convert float to double-width unsigned integer, truncating.
  vfwcvt.rtz.x.f.v  vd, vs2, vm   # Convert float to double-width signed integer, truncating.
  
  vfwcvt.f.xu.v vd, vs2, vm       # Convert unsigned integer to double-width float.
  vfwcvt.f.x.v  vd, vs2, vm       # Convert signed integer to double-width float.
  
  vfwcvt.f.f.v vd, vs2, vm        # Convert single-width float to double-width float.
  ----

::

  vfwcvt.xu.f.v vd, vs2, vm       # 浮動小数点を倍幅の符号なし整数に変換.
  vfwcvt.x.f.v  vd, vs2, vm       # 浮動小数点を倍幅の符号付き整数に変換.
  
  vfwcvt.rtz.xu.f.v vd, vs2, vm   # 浮動小数点をtruncateを使用した倍幅の符号なし整数に変換.
  vfwcvt.rtz.x.f.v  vd, vs2, vm   # 浮動小数点をtruncateを使用した倍幅の符号付き整数に変換.
  
  vfwcvt.f.xu.v vd, vs2, vm       # 符号なし整数を倍幅の浮動小数点に変換.
  vfwcvt.f.x.v  vd, vs2, vm       # 符号付き整数を倍幅の浮動小数点に変換.
  
  vfwcvt.f.f.v vd, vs2, vm        # 単一幅浮動小数点を倍幅の浮動小数点に変換.
  

..
  These instructions have the same constraints on vector register overlap
  as other widening instructions (see :ref:`sec-widening` ).

これらの命令は、他の幅拡張命令(:ref:`sec-widening` 参照)と同様に、ベクトルレジスタのオーバーラップに対する制約があります。

..
  NOTE: A double-width IEEE floating-point value can always represent a
  single-width integer exactly.

.. note::

  
..
  NOTE: A double-width IEEE floating-point value can always represent a
  single-width IEEE floating-point value exactly.

.. note::

  
..
  NOTE: A full set of floating-point widening conversions is not
  supported as single instructions, but any widening conversion can be
  implemented as several doubling steps with equivalent results and no
  additional exception flags raised.

.. note::

  同等の結果と追加の例外フラグを発生させずに、いくつかの倍のステップとして実装することができます。
  

************************************************
浮動小数点/整数 幅縮小型変換命令
************************************************

..
  A set of conversion instructions is provided to convert wider integer
  and floating-point datatypes to a type of half the width.


..
  ----
  vfncvt.xu.f.w vd, vs2, vm       # Convert double-width float to unsigned integer.
  vfncvt.x.f.w  vd, vs2, vm       # Convert double-width float to signed integer.
  
  vfncvt.rtz.xu.f.w vd, vs2, vm   # Convert double-width float to unsigned integer, truncating.
  vfncvt.rtz.x.f.w  vd, vs2, vm   # Convert double-width float to signed integer, truncating.
  
  vfncvt.f.xu.w vd, vs2, vm       # Convert double-width unsigned integer to float.
  vfncvt.f.x.w  vd, vs2, vm       # Convert double-width signed integer to float.
  
  vfncvt.f.f.w vd, vs2, vm        # Convert double-width float to single-width float.
  vfncvt.rod.f.f.w vd, vs2, vm    # Convert double-width float to single-width float,
                                  #  rounding towards odd.
  ----


::

  vfncvt.xu.f.w vd, vs2, vm       # 倍幅の浮動小数点を符号付き整数に変換.
  vfncvt.x.f.w  vd, vs2, vm       # 倍幅の浮動小数点を符号なし整数に変換.
  
  vfncvt.rtz.xu.f.w vd, vs2, vm   # 倍幅の浮動小数点をtruncateを使用した符号なし整数に変換.
  vfncvt.rtz.x.f.w  vd, vs2, vm   # 倍幅の浮動小数点をtruncateを使用した符号付き整数に変換.
  
  vfncvt.f.xu.w vd, vs2, vm       # 倍幅の符号なし整数を浮動小数点に変換.
  vfncvt.f.x.w  vd, vs2, vm       # 倍幅の符号付き整数を浮動小数点に変換.
  
  vfncvt.f.f.w vd, vs2, vm        # 倍幅浮動小数点を単一幅の浮動小数点に変換.
  vfncvt.rod.f.f.w vd, vs2, vm    # 倍幅浮動小数点をodd方向の丸めに使用して単一幅の浮動小数点に変換.
  

..
  These instructions have the same constraints on vector register overlap
  as other narrowing instructions (see :ref:`sec-narrowing` ).

これらの命令は、他の幅縮小命令(:ref:`sec-narrowing` 参照)と同様に、ベクトルレジスタのオーバーラップに制約があります。

..
  NOTE: A full set of floating-point widening conversions is not
  supported as single instructions. Conversions can be implemented in
  a sequence of halving steps.  Results are equivalently rounded and
  the same exception flags are raised if all but the last halving step
  use round-towards-odd (`vfncvt.rod.f.f.w`).  Only the final step
  should use the desired rounding mode.

.. note::

  変換は半減ステップのシーケンスで実装できます。
  最後の半減ステップ以外が round-towards-odd (`vfncvt.rod.f.f.w`) を使用した場合、結果は同等に丸められ、同じ例外フラグが立てられます。
  最後のステップだけは、希望する丸め方向を使用する必要があります。
  

#####################################
ベクトルリダクション操作
#####################################

..
  Vector reduction operations take a vector register group of elements
  and a scalar held in element 0 of a vector register, and perform a
  reduction using some binary operator, to produce a scalar result in
  element 0 of a vector register.  The scalar input and output operands
  are held in element 0 of a single vector register, not a vector
  register group, so any vector register can be the scalar source or
  destination of a vector reduction regardless of LMUL setting.

ベクトルリダクションは、ベクトルレジスタ群の要素と、ベクトルレジスタの要素0に保持されているスカラを受け取り、
何らかの二項演算子を用いてリダクションを行い、ベクトルレジスタの要素0にスカラの結果を出力するものです。
スカラの入出力オペランドは、ベクトルレジスタ群ではなく、単一のベクトルレジスタの要素0に保持されるため、LMULの設定に関わらず、
どのベクトルレジスタもベクトルリダクションのスカラのソースまたは書き込みレジスタとなります。

..
  The destination vector register can overlap the source operands,
  including the mask register.

書き込みのベクトルレジスタは、マスクレジスタを含むソースオペランドとオーバーラップすることができます。

..
  NOTE: Reductions read and write the scalar operand and result into
  element 0 of a vector register to avoid a loss of decoupling with the
  scalar processor, and to support future polymorphic use with future
  types not supported in the scalar unit.

.. note::

  スカラユニットでサポートされていない将来の型で将来のポリモーフィックな使用をサポートするために、
  スカラオペランドと結果をベクトルレジスタの要素0に読み書きします。
  
..
  Inactive elements from the source vector register group are excluded
  from the reduction, but the scalar operand is always included
  regardless of the mask values.

ソースベクトルレジスタグループの非アクティブな要素はリダクションから除外されますが、
スカラオペランドはマスク値にかかわらず常に含まれます。

..
  The other elements in the destination vector register ( 0 < index <
  VLEN/SEW) are considered the tail and are managed with the current
  tail agnostic/undisturbed policy.

書き込みベクトルレジスタ内の他の要素(0 < index < VLEN/SEW)は末尾とみなされ、
現在のtail agnostic/undisturbedポリシで管理されます。

..
  If `vl`=0, no operation is performed and the destination register is
  not updated.

`vl`=0の場合は、操作は行われず、書き込みレジスタは更新されません。


..
  Traps on vector reduction instructions are always reported with a
  `vstart` of 0.  Vector reduction operations raise an illegal
  instruction exception if `vstart` is non-zero.

ベクトルリダクション命令の例外は、常に `vstart` が0で報告されます。
ベクトルリダクション操作は、`vstart` が 0 でない場合、不正な命令の例外を発生させます。

..
  The assembler syntax for a reduction operation is `vredop.vs`, where
  the `.vs` suffix denotes the first operand is a vector register group
  and the second operand is a scalar stored in element 0 of a vector
  register.

縮小演算のアセンブラ構文は、`vredop.vs` です。`.vs` は、第1オペランドがベクトルレジスタグループで、
第 2 オペランドがベクトルレジスタの要素 0 に格納されたスカラであることを表します。

.. _sec-vector-integer-reduce:

****************************************************
ベクトル単一幅整数リダクション命令
****************************************************

..
  All operands and results of single-width reduction instructions have
  the same SEW width.  Overflows wrap around on arithmetic sums.

単一幅のリダクション命令のオペランドと結果は、すべて同じSEW幅になります。
算術加算ではオーバーフローは丸められます。

..
  ----
      # Simple reductions, where [*] denotes all active elements:
      vredsum.vs  vd, vs2, vs1, vm   # vd[0] =  sum( vs1[0] , vs2[*] )
      vredmaxu.vs vd, vs2, vs1, vm   # vd[0] = maxu( vs1[0] , vs2[*] )
      vredmax.vs  vd, vs2, vs1, vm   # vd[0] =  max( vs1[0] , vs2[*] )
      vredminu.vs vd, vs2, vs1, vm   # vd[0] = minu( vs1[0] , vs2[*] )
      vredmin.vs  vd, vs2, vs1, vm   # vd[0] =  min( vs1[0] , vs2[*] )
      vredand.vs  vd, vs2, vs1, vm   # vd[0] =  and( vs1[0] , vs2[*] )
      vredor.vs   vd, vs2, vs1, vm   # vd[0] =   or( vs1[0] , vs2[*] )
      vredxor.vs  vd, vs2, vs1, vm   # vd[0] =  xor( vs1[0] , vs2[*] )
  ----


::

  	# 単純なリダクション命令. [*]は全てのアクティブな要素を意味する:
      vredsum.vs  vd, vs2, vs1, vm   # vd[0] =  sum( vs1[0] , vs2[*] )
      vredmaxu.vs vd, vs2, vs1, vm   # vd[0] = maxu( vs1[0] , vs2[*] )
      vredmax.vs  vd, vs2, vs1, vm   # vd[0] =  max( vs1[0] , vs2[*] )
      vredminu.vs vd, vs2, vs1, vm   # vd[0] = minu( vs1[0] , vs2[*] )
      vredmin.vs  vd, vs2, vs1, vm   # vd[0] =  min( vs1[0] , vs2[*] )
      vredand.vs  vd, vs2, vs1, vm   # vd[0] =  and( vs1[0] , vs2[*] )
      vredor.vs   vd, vs2, vs1, vm   # vd[0] =   or( vs1[0] , vs2[*] )
      vredxor.vs  vd, vs2, vs1, vm   # vd[0] =  xor( vs1[0] , vs2[*] )
  

.. _sec-vector-integer-reduce-widen:

****************************************************
ベクトル幅拡張整数リダクション命令
****************************************************

..
  The unsigned `vwredsumu.vs` instruction zero-extends the SEW-wide
  vector elements before summing them, then adds the 2*SEW-width scalar
  element, and stores the result in a 2*SEW-width scalar element.

符号なしの `vwredsumu.vs` 命令は、SEW幅のベクトル要素をゼロ拡張してから合計し、
次に2*SEW幅のスカラー要素を加え、その結果を2*SEW幅のスカラー要素に格納します。

..
  The `vwredsum.vs`  instruction sign-extends the SEW-wide vector
  elements before summing them.

`vwredsum.vs` 命令は、SEW幅のベクトル要素を符号拡張してから加算します。

..
  ----
      # Unsigned sum reduction into double-width accumulator
      vwredsumu.vs vd, vs2, vs1, vm   # 2*SEW = 2*SEW + sum(zero-extend(SEW))
  
      # Signed sum reduction into double-width accumulator
      vwredsum.vs  vd, vs2, vs1, vm   # 2*SEW = 2*SEW + sum(sign-extend(SEW))
  ----

::

  	# 2倍幅での符号なしリダクション加算を倍幅でアキュムレートする
      vwredsumu.vs vd, vs2, vs1, vm   # 2*SEW = 2*SEW + sum(zero-extend(SEW))
  
  	# 符号付きリダクション加算を倍幅でアキュムレートする
      vwredsum.vs  vd, vs2, vs1, vm   # 2*SEW = 2*SEW + sum(sign-extend(SEW))
  

.. _sec-vector-float-reduce:

*************************************************************
ベクトル単一幅浮動小数点リダクション命令
*************************************************************

..
  ----
      # Simple reductions.
      vfredosum.vs vd, vs2, vs1, vm # Ordered sum
      vfredusum.vs vd, vs2, vs1, vm # Unordered sum
      vfredmax.vs  vd, vs2, vs1, vm # Maximum value
      vfredmin.vs  vd, vs2, vs1, vm # Minimum value
  
  ----


::

      # 単純なリダクション
      vfredosum.vs vd, vs2, vs1, vm # 順序付き加算
      vfredusum.vs vd, vs2, vs1, vm # 順序無し加算
      vfredmax.vs  vd, vs2, vs1, vm # 最大値
      vfredmin.vs  vd, vs2, vs1, vm # 最小値
  
  
..
  NOTE: Older assembler mnemonic `vfredsum` is retained as alias for `vfredusum`.

.. note::

  
===============================================================================
ベクトル順序付き単一幅浮動小数点リダクション加算命令
===============================================================================

..
  The `vfredosum` instruction must sum the floating-point values in
  element order, starting with the scalar in `vs1[0]`--that is, it
  performs the computation:

`vfredosum` 命令は、`vs1[0]` のスカラーから順に、
浮動小数点値を要素順に合計しなければなりません--つまり、以下のように計算を行います。

::

   vd[0] = `(((vs1[0] + vs2[0]) + vs2[1]) + ...) + vs2[vl-1]`
  
..
  where each addition operates identically to the scalar floating-point
  instructions in terms of raising exception flags and generating or
  propagating special values.

ここで各加算は、例外フラグの発生や特別な値の生成・伝搬という点で、スカラ浮動小数点演算命令と同じ動作をします。

..
  NOTE: The ordered reduction supports compiler autovectorization, while
  the unordered FP sum allows for faster implementations.

.. note::

  
..
  When the operation is masked (`vm=0`), the masked-off elements do not
  affect the result or the exception flags.

演算がマスクされている場合(`vm=0`)、マスクされていない要素は結果や例外フラグに影響を与えません。

..
  NOTE: If no elements are active, no additions are performed, so the scalar in
  `vs1[0]` is simply copied to the destination register, without canonicalizing
  NaN values and without setting any exception flags.  This behavior preserves
  the handling of NaNs, exceptions, and rounding when autovectorizing a scalar
  summation loop.

.. note::

  この動作は、スカラの加算ループを自動ベクトル化する際のNaN、例外、丸めの処理と同一です。
  

===============================================================================
ベクトル順序無し単一幅浮動小数点リダクション加算命令
===============================================================================

..
  The unordered sum reduction instruction, `vfredusum`, provides an
  implementation more freedom in performing the reduction.

順序無しリダクション命令である `vfredusum` は、リダクションを実行する際の実装の自由度を高めます。

..
  The implementation must produce a result equivalent to a reduction tree
  composed of binary operator nodes, with the inputs being elements from
  the source vector register group (`vs2`) and the source scalar value
  (`vs1[0]`).  Each operator in the tree accepts two inputs and produces
  one result.
  Each operator first computes an exact sum as a RISC-V scalar floating-point
  addition with infinite exponent range and precision, then converts this exact
  sum to a floating-point format with range and precision each at least as great
  as the element floating-point format indicated by SEW, rounding using the
  currently active floating-point dynamic rounding mode.
  A different floating-point range and precision may be chosen for the result of
  each operator.
  A node where one input is derived only from elements masked-off or beyond the
  active vector length may either treat that input as the additive identity of the
  appropriate EEW or simply copy the other input to its output.
  The rounded result from the root node in the tree is converted (rounded again,
  using the dynamic rounding mode) to the standard floating-point format
  indicated by SEW.
  An implementation
  is allowed to add an additional additive identity to the final result.

実装では、ソースのベクトルレジスタ群(`vs2`)とソースのスカラー値(`vs1[0]`)の要素を入力とし
バイナリ演算子ノードで構成されるリダクションツリーと同等の結果を生成する必要があります。
ツリー内の各演算子は、2つの入力を受け入れ、1つの結果を生成します。
各演算子は、まず、RISC-Vのスカラ浮動小数点加算として、指数の範囲と精度が無限大の正確な和を計算し、
次に、この正確な和を、SEWで示される要素浮動小数点フォーマットと少なくとも同じ範囲と精度を持つ浮動小数点フォーマットに変換し、
現在アクティブな浮動小数点ダイナミック丸めモードを用いて丸めます。
各演算子の結果には、異なる浮動小数点の範囲と精度を選択することができます。
一方の入力が、マスクされた要素やアクティブなベクトル長を超えた要素からのみ得られるノードは、
その入力を適切なEEWの加法単位元として扱うか、あるいは単に他方の入力をその出力にコピーすることができます。
ツリーのルートノードからの丸められた結果は、SEW で示される標準的な浮動小数点フォーマットに変換されます(動的丸めモードを使用して再度丸められます)。
実装では、最終結果に加法単位元を加えることができます。

..
  The additive identity is +0.0 when rounding down (towards -{inf}) or
  -0.0 for all other rounding modes.

加法単位元は、切り捨て(-{inf}方向)の場合は+0.0、その他の丸め方の場合は-0.0です。

..
  The reduction tree structure must be deterministic for a given value
  in `vtype` and `vl`.

リダクションツリーの構造は、`vtype` と `vl` に与えられた値に対して決定的でなければなりません。

..
  NOTE: As a consequence of this definition, implementations need not propagate
  NaN payloads through the reduction tree when no elements are active. In
  particular, if no elements are active and the scalar input is NaN,
  implementations are permitted to canonicalize the NaN and, if the NaN is
  signaling, set the invalid exception flag.  Implementations are alternatively
  permitted to pass through the original NaN and set no exception flags, as with
  `vfredosum` .

.. note::

  特に、アクティブな要素がなく、スカラー入力がNaNの場合、実装はNaNを正規化し、NaNがシグナリングの場合は、無効な例外フラグを設定することが許可されています。
  また、`vfredosum` のように、元のNaNを通過させ、例外フラグを設定しないことも可能です。
  
..
  NOTE: The `vfredosum` instruction is a valid implementation of the
  `vfredusum` instruction.

.. note::

  
======================================================================
ベクトル単一幅浮動小数点最大・最小リダクション
======================================================================

..
  NOTE: Floating-point max and min reductions should return the same
  final value and raise the same exception flags regardless of operation
  order.

.. note::

  
..
  NOTE: If no elements are active, the scalar in `vs1[0]` is simply copied to
  the destination register, without canonicalizing NaN values and without
  setting any exception flags.

.. note::

  その際、例外フラグは設定されず、NaNの処理も行われません。
  
.. _sec-vector-float-reduce-widen:

*************************************************************
ベクトル幅拡張浮動小数点リダクション命令
*************************************************************

..
  Widening forms of the sum reductions are provided that
  read and write a double-width reduction result.

リダクション加算命令における幅拡張の形式は、倍幅のデータの読み書きを行います。。

..
  ----
   # Simple reductions.
   vfwredosum.vs vd, vs2, vs1, vm # Ordered sum
   vfwredusum.vs vd, vs2, vs1, vm # Unordered sum
  ----

::

   # 単純なリダクション
   vfwredosum.vs vd, vs2, vs1, vm # 順序付きリダクション加算
   vfwredusum.vs vd, vs2, vs1, vm # 順序なしリダクション加算命令
  

..
  NOTE: Older assembler mnemonic `vfwredsum` is retained as alias for `vfwredusum`.

.. note::

  
..
  The reduction of the SEW-width elements is performed as in the
  single-width reduction case, with the elements in `vs2` promoted
  to 2*SEW bits before adding to the 2*SEW-bit accumulator.

SEW幅の要素のリダクション操作は単一幅で行われ、`vs2` の要素は2*SEWに拡張されて2*SEWビットのアキュムレータに加算されます。

..
  NOTE: `vfwredosum.vs` handles inactive elements and NaN payloads analogously
  to `vfredosum.vs`; `vfwredusum.vs` does so analogously to `vfredusum.vs`.

.. note::

  `vfwredusum.vs`  は `vfredusum.vs` と同様に動作します。
  
.. _sec-vector-mask:

############################
ベクトルマスク命令
############################

..
  Several instructions are provided to help operate on mask values held in
  a vector register.

ベクトルレジスタに格納されているマスクを操作するための命令がいくつか提供されています。

.. _sec-mask-register-logical:

**********************************************
ベクトルマスクレジスタ論理命令
**********************************************

..
  Vector mask-register logical operations operate on mask registers.
  Each element in a mask register is a single bit, so these instructions
  all operate on single vector registers regardless of the setting of
  the `vlmul` field in `vtype`.  They do not change the value of
  `vlmul`.  The destination vector register may be the same as either
  source vector register.

ベクトルマスクレジスタ論理命令は、マスクレジスタに対して演算を実行します。
マスクレジスタの各要素は1ビットであり、従ってこれらの命令は `vtype` フィールド内の `vlmul`  の設定に関係なく単一のベクトルレジスタを操作します。
これらの命令は `vlmul`  の設定を変更しません。
書き込みベクトルレジスタはソースベクトルレジスタと同一である可能性があります。

..
  As with other vector instructions, the elements with indices less than
  `vstart` are unchanged, and `vstart` is reset to zero after execution.
  Vector mask logical instructions are always unmasked, so there are no
  inactive elements, and the encodings with `vm=0` are reserved.
  Mask elements past `vl`, the tail elements, are
  always updated with a tail-agnostic policy.

他のベクトル命令と同様に、`vstart` インデックスよりも小さな要素は変更されず、命令実行後には `vstart`  の値は0にリセットされます。
ベクトルマスク論理命令は常にマスクは適用されず、従って非アクティブ要素はありませんん。`vm=0` に相当するエンコーディングは予約されています。
`vl` 以降のマスク要素、つまり末尾エレメントは常にtail-agnosticポリシに基づいて更新されます。

::

      vmand.mm vd, vs2, vs1     # vd.mask[i] =   vs2.mask[i] &&  vs1.mask[i]
      vmnand.mm vd, vs2, vs1    # vd.mask[i] = !(vs2.mask[i] &&  vs1.mask[i])
      vmandnot.mm vd, vs2, vs1  # vd.mask[i] =   vs2.mask[i] && !vs1.mask[i]
      vmxor.mm  vd, vs2, vs1    # vd.mask[i] =   vs2.mask[i] ^^  vs1.mask[i]
      vmor.mm  vd, vs2, vs1     # vd.mask[i] =   vs2.mask[i] ||  vs1.mask[i]
      vmnor.mm  vd, vs2, vs1    # vd.mask[i] = !(vs2.mask[i] ||  vs1.mask[i])
      vmornot.mm  vd, vs2, vs1  # vd.mask[i] =   vs2.mask[i] || !vs1.mask[i]
      vmxnor.mm vd, vs2, vs1    # vd.mask[i] = !(vs2.mask[i] ^^  vs1.mask[i])
  

..
  Several assembler pseudoinstructions are defined as shorthand for
  common uses of mask logical operations:

一般的に使用されるマスク論理操作のために、いくつかのアセンブラ疑似命令が定義されています。

::

      vmmv.m vd, vs  => vmand.mm vd, vs, vs  # Copy mask register
      vmclr.m vd     => vmxor.mm vd, vd, vd   # Clear mask register
      vmset.m vd     => vmxnor.mm vd, vd, vd  # Set mask register
      vmnot.m vd, vs => vmnand.mm vd, vs, vs  # Invert bits
  

..
  NOTE: The vmmv.m instruction was previously called vmcpy.m, but with
  new layout it is more consistent to name as a "mv" because bits are
  copied without interpretation.  The vmcpy.m assembler
  pseudoinstruction can be retained for compatibility.

.. note::

  ビットが解釈無しにコピーされるため "mv"と読んだ方が整合性があります。
  vmcpy.m アセンブラ疑似命令は互換性のために残されています。
  
..
  The set of eight mask logical instructions can generate any of the 16
  possibly binary logical functions of the two input masks:

8つのマスク論理命令は2つの入力マスクに対する任意の16個のバイナリ論理演算を実行することができます。

..


..


..
  NOTE: The vector mask logical instructions are designed to be easily
  fused with a following masked vector operation to effectively expand
  the number of predicate registers by moving values into `v0` before
  use.

.. note::

  次のマスクされたベクトル演算と簡単に融合できるように設計されています。
  

*********************************************
ベクトルマスクPopカウント `vpopc`
*********************************************

::

      vpopc.m rd, vs2, vm
  

..
  The source operand is a single vector register holding mask register
  values as described in Section :ref:`sec-mask-register-layout` .

ソースオペランドは、 :ref:`sec-mask-register-layout`  節で説明されているように、マスクレジスタの値を保持する単一のベクトルレジスタです。

..
  The `vpopc.m` instruction counts the number of mask elements of the
  active elements of the vector source mask register that have the value
  1 and writes the result to a scalar `x` register.

`vpopc.m` 命令は、ベクトルソースマスクレジスタのアクティブ要素のうち、値が1であるマスク要素の数を数え、その結果をスカラの `x`  レジスタに書き込みます。

..
  The operation can be performed under a mask, in which case only the
  masked elements are counted.

この操作はマスク下で行うことができ、その場合はマスクされた要素のみがカウントされます。

::

   vpopc.m rd, vs2, v0.t # x[rd] = sum**i ( vs2.mask[i] && v0.mask[i] )
  

..
  Traps on `vpopc.m` are always reported with a `vstart` of 0.  The
  `vpopc` instruction will raise an illegal instruction exception if
  `vstart` is non-zero.


`vpopc.m` 上の例外は、常に `vstart` が 0 で通知されます。
`vpopc` 命令は、`vstart` が0でない場合、不正命令例外を発生させます。


*************************************************
`vfirst` find-first-set マスクビット命令
*************************************************

::

      vfirst.m rd, vs2, vm
  

..
  The `vfirst` instruction finds the lowest-numbered active element of
  the source mask vector that has the value 1 and writes that element's
  index to a GPR.  If no active element has the value 1, -1 is written
  to the GPR.

`vfirst` 命令は、ソースマスクベクトルの中から、値1を持つ最も低い番号のアクティブな要素を見つけ、その要素のインデックスをGPRに書き込みます。
値が1のアクティブな要素がない場合は、-1が書き込まれます。

..
  NOTE: Software can assume that any negative value (highest bit set)
  corresponds to no element found, as vector lengths will never exceed
  2^(XLEN-1)^ on any implementation.

ベクトルの長さはどのような実装でも2^(XLEN-1)^ を超えることはありませんので、ソフトウェアは負の値(最上位ビットの設定)があれば、
要素が見つからないと仮定することができます。

..
  Traps on `vfirst` are always reported with a `vstart` of 0.  The
  `vfirst` instruction will raise an illegal instruction exception if
  `vstart` is non-zero.

`vfirst` の例外は常に 0 の `vstart` で報告されます。
`vstart` が 0 でない場合、`vfirst` 命令は不正命令例外を発生させます。


*********************************************
`vmsbf.m` set-before-firstマスクビット
*********************************************

..
  ----
      vmsbf.m vd, vs2, vm
  
   # Example
  
       7 6 5 4 3 2 1 0   Element number
  
       1 0 0 1 0 1 0 0   v3 contents
                         vmsbf.m v2, v3
       0 0 0 0 0 0 1 1   v2 contents
  
       1 0 0 1 0 1 0 1   v3 contents
                         vmsbf.m v2, v3
       0 0 0 0 0 0 0 0   v2
  
       0 0 0 0 0 0 0 0   v3 contents
                         vmsbf.m v2, v3
       1 1 1 1 1 1 1 1   v2
  
       1 1 0 0 0 0 1 1   v0 vcontents
       1 0 0 1 0 1 0 0   v3 contents
                         vmsbf.m v2, v3, v0.t
       0 1 x x x x 1 1   v2 contents
  ----


::

      vmsbf.m vd, vs2, vm
  
   # 例
  
       7 6 5 4 3 2 1 0   要素番号
  
       1 0 0 1 0 1 0 0   v3 の値
                         vmsbf.m v2, v3
       0 0 0 0 0 0 1 1   v2 の値
  
       1 0 0 1 0 1 0 1   v3 の値
                         vmsbf.m v2, v3
       0 0 0 0 0 0 0 0   v2
  
       0 0 0 0 0 0 0 0   v3 値
                         vmsbf.m v2, v3
       1 1 1 1 1 1 1 1   v2
  
       1 1 0 0 0 0 1 1   v0 の値
       1 0 0 1 0 1 0 0   v3 の値
                         vmsbf.m v2, v3, v0.t
       0 1 x x x x 1 1   v2 の値
  

..
  The `vmsbf.m` instruction takes a mask register as input and writes
  results to a mask register.  The instruction writes a 1 to all active
  mask elements before the first source element that is a 1, then
  writes a 0 to that element and all following active elements.  If
  there is no set bit in the source vector, then all active elements in
  the destination are written with a 1.

`vmsbf.m` 命令は、マスク・レジスタを入力とし、結果をマスク・レジスタに書き込みます。
この命令は、1である最初のソース要素の前にあるすべてのアクティブなマスク要素に1を書き込み、
その要素とそれに続くすべてのアクティブな要素に0を書き込みます。
ソースベクトルにセットビットがない場合、書き込みレジスタのすべてのアクティブ要素に1が書き込まれます。

..
  The tail elements in the destination mask register are updated under a
  tail-agnostic policy.

書き込みマスクレジスタの末尾要素はtail-agnosticポリシに基づいて更新されます。

..
  Traps on `vmsbf.m` are always reported with a `vstart` of 0.  The
  `vmsbf` instruction will raise an illegal instruction exception if
  `vstart` is non-zero.

`vmsbf.m` の例外は、常に `vstart` =0の状態で報告されます。
`vmsbf` 命令は、`vstart` が 0 でない場合、不正命令例外を発生します。

..
  The destination register cannot overlap the source register
  and, if masked, cannot overlap the mask register ('v0').

書き込みレジスタはソースレジスタをオーバラップすることは出来ません。
マスク付き命令の場合、`v0` マスクレジスタとオーバラップすることは出来ません。


*******************************************************
`vmsif.m` set-including-first マスクビット命令
*******************************************************

..
  The vector mask set-including-first instruction is similar to
  set-before-first, except it also includes the element with a set bit.

ベクトルset-including-first命令はset-before-first命令と似ていますが、
セットビットを含めるところが異なります。

..
  ----
      vmsif.m vd, vs2, vm
  
   # Example
  
       7 6 5 4 3 2 1 0   Element number
  
       1 0 0 1 0 1 0 0   v3 contents
                         vmsif.m v2, v3
       0 0 0 0 0 1 1 1   v2 contents
  
       1 0 0 1 0 1 0 1   v3 contents
                         vmsif.m v2, v3
       0 0 0 0 0 0 0 1   v2
  
       1 1 0 0 0 0 1 1   v0 vcontents
       1 0 0 1 0 1 0 0   v3 contents
                         vmsif.m v2, v3, v0.t
       1 1 x x x x 1 1   v2 contents
  ----


::

      vmsif.m vd, vs2, vm
  
   # 例
  
       7 6 5 4 3 2 1 0   要素番号
  
       1 0 0 1 0 1 0 0   v3 の値
                         vmsif.m v2, v3
       0 0 0 0 0 1 1 1   v2 の値
  
       1 0 0 1 0 1 0 1   v3 の値
                         vmsif.m v2, v3
       0 0 0 0 0 0 0 1   v2
  
       1 1 0 0 0 0 1 1   v0 の値
       1 0 0 1 0 1 0 0   v3 の値
                         vmsif.m v2, v3, v0.t
       1 1 x x x x 1 1   v2 の値
  

..
  The tail elements in the destination mask register are updated under a
  tail-agnostic policy.

書き込みマスクレジスタの末尾要素はtail-agnosticポリシに基づいて更新されます。

..
  Traps on `vmsif.m` are always reported with a `vstart` of 0.  The
  `vmsif` instruction will raise an illegal instruction exception if
  `vstart` is non-zero.

`vmsif.m` の例外は、常に `vstart` =0の状態で報告されます。
`vmsif` 命令は、`vstart` が 0 でない場合、不正命令例外を発生します。

..
  The destination register cannot overlap the source register
  and, if masked, cannot overlap the mask register ('v0').

書き込みレジスタはソースレジスタをオーバラップすることは出来ません。
マスク付き命令の場合、`v0` マスクレジスタとオーバラップすることは出来ません。


**************************************************
`vmsof.m` set-only-first マスクビット命令
**************************************************

..
  The vector mask set-only-first instruction is similar to
  set-before-first, except it only sets the first element with a bit
  set, if any.

set-only-firstベクトル命令はset-before-first命令と似ていますが、
ビットがセットされている最初の要素のみを設定するところが異なります。

..
  ----
      vmsof.m vd, vs2, vm
  
   # Example
  
       7 6 5 4 3 2 1 0   Element number
  
       1 0 0 1 0 1 0 0   v3 contents
                         vmsof.m v2, v3
       0 0 0 0 0 1 0 0   v2 contents
  
       1 0 0 1 0 1 0 1   v3 contents
                         vmsof.m v2, v3
       0 0 0 0 0 0 0 1   v2
  
       1 1 0 0 0 0 1 1   v0 vcontents
       1 1 0 1 0 1 0 0   v3 contents
                         vmsof.m v2, v3, v0.t
       0 1 x x x x 0 0   v2 contents
  ----


::

      vmsof.m vd, vs2, vm
  
   # 例
  
       7 6 5 4 3 2 1 0   要素番号
  
       1 0 0 1 0 1 0 0   v3 の値
                         vmsof.m v2, v3
       0 0 0 0 0 1 0 0   v2 の値
  
       1 0 0 1 0 1 0 1   v3 の値
                         vmsof.m v2, v3
       0 0 0 0 0 0 0 1   v2
  
       1 1 0 0 0 0 1 1   v0 の値
       1 1 0 1 0 1 0 0   v3 の値
                         vmsof.m v2, v3, v0.t
       0 1 x x x x 0 0   v2 の値
  

..
  The tail elements in the destination mask register are updated under a
  tail-agnostic policy.

書き込みマスクレジスタの末尾要素はtail-agnosticポリシに基づいて更新されます。

..
  Traps on `vmsof.m` are always reported with a `vstart` of 0.  The
  `vmsof` instruction will raise an illegal instruction exception if
  `vstart` is non-zero.

`vmsof.m` の例外は、常に `vstart` =0の状態で報告されます。
`vmsof` 命令は、`vstart` が 0 でない場合、不正命令例外を発生します。

..
  The destination register cannot overlap the source register
  and, if masked, cannot overlap the mask register ('v0').

書き込みレジスタはソースレジスタをオーバラップすることは出来ません。
マスク付き命令の場合、`v0` マスクレジスタとオーバラップすることは出来ません。


****************************************
ベクトルマスク命令の使用例
****************************************

..
  The following is an example of vectorizing a data-dependent exit loop.

以下はベクトル化されたデータに依存するループ終了コードです。

::

  include::example/strcpy.s[lines=4..-1]
  
::

  include::example/strncpy.s[lines=4..-1]
  

***********************
ベクトルIota命令
***********************

..
  The `viota.m` instruction reads a source vector mask register and
  writes to each element of the destination vector register group the
  sum of all the bits of elements in the mask register
  whose index is less than the element, e.g., a parallel prefix sum of
  the mask values.

`viota.m` 命令は、ソースベクトルマスクレジスタを読み込み、書き込みベクトルレジスタグループの各要素に
マスク・レジスタの要素のうち、インデックスがその要素よりも小さい要素のすべてのビットの合計、
すなわち、マスク値のパラレルプレフィックス和を書き込みます。

..
  This instruction can be masked, in which case only the enabled
  elements contribute to the sum.

この命令はマスクを使用することができます。この場合アクティブな要素のみが
加算に使用されます。

..
  ----
   viota.m vd, vs2, vm
  
   # Example
  
       7 6 5 4 3 2 1 0   Element number
  
       1 0 0 1 0 0 0 1   v2 contents
                         viota.m v4, v2 # Unmasked
       2 2 2 1 1 1 1 0   v4 result
  
       1 1 1 0 1 0 1 1   v0 contents
       1 0 0 1 0 0 0 1   v2 contents
       2 3 4 5 6 7 8 9   v4 contents
                         viota.m v4, v2, v0.t # Masked, vtype.vma=0
       1 1 1 5 1 7 1 0   v4 results
  ----

::

   viota.m vd, vs2, vm
  
   # 例
  
       7 6 5 4 3 2 1 0   要素数
  
       1 0 0 1 0 0 0 1   v2 の値
                         viota.m v4, v2 # Unmasked
       2 2 2 1 1 1 1 0   v4 の結果
  
       1 1 1 0 1 0 1 1   v0 の値
       1 0 0 1 0 0 0 1   v2 の値
       2 3 4 5 6 7 8 9   v4 の値
                         viota.m v4, v2, v0.t # Masked, vtype.vma=0
       1 1 1 5 1 7 1 0   v4 の値
  

..
  The result value is zero-extended to fill the destination element if
  SEW is wider than the result.  If the result value would overflow the
  destination SEW, the least-significant SEW bits are retained.

SEWが結果の値よりも大きい場合は、結果値をゼロ拡張して出力要素を埋めます。
結果値が出力先のSEWをオーバーフローする場合は、最下位のSEWビットが保持されます。

..
  Traps on `viota.m` are always reported with a `vstart` of 0, and
  execution is always restarted from the beginning when resuming after a
  trap handler.  An illegal instruction exception is raised if `vstart`
  is non-zero.

`viota.m`  の例外は、常に `vstart` =0として報告され、例外ハンドラの後に再開するときは、常に最初から実行が再開されます。
`vstart` が 0 でない場合は、不正命令例外が発生します。

..
  The destination register group cannot overlap the source register
  and, if masked, cannot overlap the mask register (`v0`).

書き込みレジスタグループはソース・レジスタと重なることはできず、マスクされている場合はマスク・レジスタ (`v0`) と重なることはできません。

..
  NOTE: These constraints exist for two reasons.  First, to simplify
  avoidance of WAR hazards in implementations with temporally long vector
  registers and no vector register renaming.  Second, to enable resuming
  execution after a trap simpler.

.. note::

  1つ目の理由は、時間的に長いベクトル・レジスタを持ち、ベクトル・レジスタのリネームを行わない実装において、WARハザードの回避を容易にするためです。
  第二に、例外が単純化された後に実行を再開することを可能にするためです。
  
..
  The `viota.m` instruction can be combined with memory scatter
  instructions (indexed stores) to perform vector compress functions.

`viota.m` 命令は、メモリ・スキャッタ命令(インデックス・ストア)と組み合わせて、
ベクトル圧縮機能を実行することができます。

..
  ----
      # Compact non-zero elements from input memory array to output memory array
      #
      # size*t compact*non*zero(size*t n, const int* in, int* out)
      # {
      #   size**t i;
      #   size**t count = 0;
      #   int *p = out;
      #
      #   for (i=0; i<n; i++)
      #   {
      #       const int v = *in++;
      #       if (v != 0)
      #           *p++ = v;
      #   }
      #
      #   return (size**t) (p - out);
      # }
      #
      # a0 = n
      # a1 = &in
      # a2 = &out
  
  compact*non*zero:
      li a6, 0                      # Clear count of non-zero elements
  loop:
      vsetvli a5, a0, e32, m8, ta, ma   # 32-bit integers
      vle32.v v8, (a1)               # Load input vector
        sub a0, a0, a5               # Decrement number done
        slli a5, a5, 2               # Multiply by four bytes
      vmsne.vi v0, v8, 0             # Locate non-zero values
        add a1, a1, a5               # Bump input pointer
      vpopc.m a5, v0                 # Count number of elements set in v0
      viota.m v16, v0                # Get destination offsets of active elements
        add a6, a6, a5               # Accumulate number of elements
      vsll.vi v16, v16, 2, v0.t      # Multiply offsets by four bytes
        slli a5, a5, 2               # Multiply number of non-zero elements by four bytes
      vsuxei32.v v8, (a2), v16, v0.t # Scatter using scaled viota results under mask
        add a2, a2, a5               # Bump output pointer
        bnez a0, loop                # Any more?
  
        mv a0, a6                    # Return count
        ret
  ----


::

  	# 入力メモリ配列から、非ゼロの要素を圧縮して出力メモリ配列に格納する
      #
      # size*t compact*non*zero(size*t n, const int* in, int* out)
      # {
      #   size**t i;
      #   size**t count = 0;
      #   int *p = out;
      #
      #   for (i=0; i<n; i++)
      #   {
      #       const int v = *in++;
      #       if (v != 0)
      #           *p++ = v;
      #   }
      #
      #   return (size**t) (p - out);
      # }
      #
      # a0 = n
      # a1 = &in
      # a2 = &out
  
  compact*non*zero:
      li a6, 0                      # 非ゼロ要素のカウンタをクリアする
  loop:
      vsetvli a5, a0, e32, m8, ta, ma   # 32-bit整数
      vle32.v v8, (a1)               # 入力ベクトルをロードする
        sub a0, a0, a5               # ロードした要素数を減算する
        slli a5, a5, 2               # 4倍する
      vmsne.vi v0, v8, 0             # 非ゼロの場所を特定する
        add a1, a1, a5               # 入力ポインタを進める
      vpopc.m a5, v0                 # v0中の非ゼロ値を数える
      viota.m v16, v0                # アクティブ要素の出力オフセットを取得する
        add a6, a6, a5               # 要素数を加算する
      vsll.vi v16, v16, 2, v0.t      # オフセットを4バイト分乗算する
        slli a5, a5, 2               # 非ゼロの要素の数を4バイト分乗算する
      vsuxei32.v v8, (a2), v16, v0.t # スケールしたviotaの結果をマスクに基づいてメモリに書き込む
        add a2, a2, a5               # 出力ポインタを進める
        bnez a0, loop                # これ以上あるか？
  
        mv a0, a6                    # カウント数を返す
        ret
  

..
  The `vid.v` instruction writes each element's index to the
  destination vector register group, from 0 to `vl`-1.

`vid.v` 命令は、各要素のインデックスを、0から `vl` -1までの書き込みベクトルレジスタグループに書き込みます。

..
  ----
      vid.v vd, vm  # Write element ID to destination.
  ----


::

      vid.v vd, vm  # 要素のインデックスを書き込みレジスタに書き込む
  

..
  The instruction can be masked.

この命令はマスクを使用することができます。

..
  The `vs2` field of the instruction must be set to `v0`, otherwise the
  encoding is *reserved*.

この命令の `vs2`  フィールドは `v0` を設定しなければなりません、
そうでない場合のエンコーディングは *予約されています*。

..
  The result value is zero-extended to fill the destination element if
  SEW is wider than the result.  If the result value would overflow the
  destination SEW, the least-significant SEW bits are retained.

結果がSEWよりも小さいビット幅である場合、ゼロ拡張して書き込み要素に書き込まれます。
結果がSEWよりも大きい場合、下位のSEWビットが保持されます。

..
  NOTE: Microarchitectures can implement `vid.v` instruction using the
  same datapath as `viota.m` but with an implicit set mask source.

.. note::

  同じデータパスを使用して、暗黙的なマスクソースを使用することで実装することができます。
  
.. _sec-vector-permute:

##################################
ベクトル組み合わせ命令
##################################

..
  A range of permutation instructions are provided to move elements
  around within the vector registers.

ベクトルレジスタ内の要素を移動させるために、さまざまな並べ替え命令が用意されています。


****************************
整数スカラ移動命令
****************************

..
  The integer scalar read/write instructions transfer a single
  value between a scalar `x` register and element 0 of a vector
  register.  The instructions ignore LMUL and vector register groups.

整数スカラ読み込み・書き込み命令は、スカラ `x` レジスタとベクトルレジスタの要素0との間で1つの値を転送します。
この命令はLMULやベクトルレジスタグループを無視します。

::

  vmv.x.s rd, vs2  # x[rd] = vs2[0] (vs1=0)
  vmv.s.x vd, rs1  # vd[0] = x[rs1] (vs2=0)
  

..
  The `vmv.x.s` instruction copies a single SEW-wide element from index 0 of the
  source vector register to a destination integer register.  If SEW > XLEN, the
  least-significant XLEN bits are transferred and the upper SEW-XLEN bits are
  ignored.  If SEW < XLEN, the value is sign-extended to XLEN bits.

`vmv.x.s` は SEW 幅の単一要素をソースベクトルレジスタのインデックス0から書き込み整数レジスタにコピーする命令です。
SEW > XLENの場合、最下位のXLENビットが転送され、上位のSEW-XLENビットは無視されます。
SEW < XLENの場合、値はXLENビットに符号拡張されます。

..
  The `vmv.s.x` instruction copies the scalar integer register to element 0 of
  the destination vector register.  If SEW < XLEN, the least-significant bits
  are copied and the upper XLEN-SEW bits are ignored.  If SEW > XLEN, the value
  is sign-extended to SEW bits.  The other elements in the destination vector
  register ( 0 < index < VLEN/SEW) are treated as tail elements using the current tail agnostic/undisturbed policy.  If `vstart` {ge} `vl`, no
  operation is performed and the destination register is not updated.

`vmv.s.x` 命令は、スカラ整数レジスタを書き込みベクトルレジスタの要素 0 にコピーします。
SEW < XLENの場合、最下位のビットがコピーされ、XLEN-SEWの上位ビットは無視されます。
SEW > XLENの場合、値はSEWビットに符号拡張されます。 書き込みベクトルレジスタ内の他の要素 ( 0 < index < VLEN/SEW ) は
現在の末尾agnostic/undisturbedポリシを使用して、末尾要素として扱われます。
`vstart` {ge}の場合  `vl` の場合、操作は行われず、書き込みレジスタは更新されません。

..
  NOTE: As a consequence, when `vl`=0, no elements are updated in the
  destination vector register group, regardless of `vstart`.

.. note::

  
..
  The encodings corresponding to the masked versions (`vm=0`) of `vmv.x.s`
  and `vmv.s.x` are reserved.

`vmv.x.s` と `vmv.s.x` のマスク付きバージョン (`vm=0`) に対応するエンコーディングは予約されています。


=====================================
浮動小数点スカラ移動命令
=====================================

..
  The floating-point scalar read/write instructions transfer a single
  value between a scalar `f` register and element 0 of a vector
  register.  The instructions ignore LMUL and vector register groups.

浮動小数点スカラの読み込み・書き込み命令は、スカラ `f` レジスタとベクトルレジスタの要素0との間で1つの値を転送します。
この命令は、LMULとベクトルレジスタグループを無視します。

::

  vfmv.f.s rd, vs2  # f[rd] = vs2[0] (rs1=0)
  vfmv.s.f vd, rs1  # vd[0] = f[rs1] (vs2=0)
  

..
  The `vfmv.f.s` instruction copies a single SEW-wide element from index
  0 of the source vector register to a destination scalar floating-point
  register.

`vfmv.f.s` は、SEW幅の要素を1つだけ、コピー元のベクトルレジスタの要素0からコピー先のスカラ浮動小数点レジスタにコピーします。

..
  The `vfmv.s.f` instruction copies the scalar floating-point register
  to element 0 of the destination vector register.  The other elements
  in the destination vector register ( 0 < index < VLEN/SEW) are treated
  as tail elements using the current tail agnostic/undisturbed policy.
  If `vstart` {ge} `vl`, no operation is performed and the destination
  register is not updated.

`vfmv.s.f` 命令は、スカラ浮動小数点レジスタを、書き込みベクトルレジスタの要素0にコピーします。
書き込みベクトルレジスタの他の要素(0 < index < VLEN/SW)は、現在のTail-agnostic/Undisturbedポリシを用いて、末尾要素として扱われます。
もし、`vstart` {ge} `vl` の場合、操作は行われず、書き込みレジスタは更新されません。

..
  NOTE: As a consequence, when `vl`=0, no elements are updated in the
  destination vector register group, regardless of `vstart`.

.. note::

  
..
  The encodings corresponding to the masked versions (`vm=0`) of `vfmv.f.s`
  and `vfmv.s.f` are reserved.

`vfmv.f.s` と `vfmv.s.f`  のマスクバージョン(`vm=0`)に対応するエンコーディングは予約されています。


*******************************
ベクトルスライド命令
*******************************

..
  The slide instructions move elements up and down a vector register
  group.

スライド命令は、ベクトルレジスタグループの要素を上下に移動させます。

..
  NOTE: The slide operations can be implemented much more efficiently
  than using the arbitrary register gather instruction.  Implementations
  may optimize certain OFFSET values for `vslideup` and `vslidedown`.
  In particular, power-of-2 offsets may operate substantially faster
  than other offsets.

.. note::

  実装では、特定のオフセット値を `vslideup` および `vslidedown` に最適化することができます。
  特に、2の累乗オフセットは、他のオフセットよりも大幅に高速に動作する可能性があります。
  
..
  For all of the `vslideup`, `vslidedown`, `v[f]slide1up`, and
  `v[f]slide1down` instructions, if `vstart` {ge} `vl`, the instruction performs no
  operation and leaves the destination vector register unchanged.

すべての `vslideup`, `vslidedown`, `v[f]slide1up`, `v[f]slide1down` 命令において、`vstart` {ge} `vl` の場合、
この命令は操作を行わず、書き込みベクトルレジスタを変更せずに残します。

..
  NOTE: As a consequence, when `vl`=0, no elements are updated in the
  destination vector register group, regardless of `vstart`.

.. note::

  
..
  The tail agnostic/undisturbed policy is followed for tail elements.

末尾要素については、Tail agnostic/undisturbedポリシに基づいて適用されます。

..
  The slide instructions may be masked, with mask element *i*
  controlling whether *destination* element *i* is written.  The mask
  undisturbed/agnostic policy is followed for inactive elements.

スライド命令はマスクされていて、マスク要素 *i* が、*書き込み* 要素 *i* を書き込むかどうかを制御している場合があります。
非アクティブな要素については、Mask undisturbed/agnosticポリシに従います。


========================================
ベクトルスライドアップ命令
========================================

::

   vslideup.vx vd, vs2, rs1, vm        # vd[i+rs1] = vs2[i]
   vslideup.vi vd, vs2, uimm, vm       # vd[i+uimm] = vs2[i]
  

..
  For `vslideup`, the value in `vl` specifies the maximum number of destination
  elements that are written.  The start index (*OFFSET*) for the
  destination can be either specified using an unsigned integer in the
  `x` register specified by `rs1`, or a 5-bit immediate, zero-extended to XLEN bits.
  If XLEN > SEW, *OFFSET* is *not* truncated to SEW bits.
  Destination elements *OFFSET* through `vl`-1 are written if unmasked and
  if *OFFSET* < `vl`.

`vslideup` では、`vl` の値で、書き込みレジスタの最大要素数を指定します。
書き込みレジスタの開始インデックス (*OFFSET*) は、`rs1` で指定された `x` レジスタの符号なし整数、
または XLEN ビットにゼロ拡張された 5 ビットの即値のいずれかを使用して指定できます。
XLEN > SEWの場合、*OFFSET* はSEWビットに切り捨てられません。
マスキングされておらず、かつ、 *OFFSET* < `vl` であれば、 *OFFSET* から `vl` -1 までの出力要素が書き込まれます。

..
  ----
     vslideup behavior for destination elements
  
     OFFSET is amount to slideup, either from x register or a 5-bit immediate
  
                      0 <  i < max(vstart, OFFSET)  Unchanged
    max(vstart, OFFSET) <= i < vl                   vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled
                     vl <= i < VLMAX                Follow tail policy
  ----


::

     vslideup の書き込みレジスタ要素の動作
  
     OFFSET is amount to slideup, either from x register or a 5-bit immediate
  
                      0 <  i < max(vstart, OFFSET)  Unchanged
    max(vstart, OFFSET) <= i < vl                   vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled
                     vl <= i < VLMAX                Follow tail policy
  

..
  The destination vector register group for `vslideup` cannot overlap
  the source vector register group, otherwise the instruction encoding
  is reserved.

`vslideup` の書き込みベクトルレジスタグループはソースベクトルレジスタグループと重ならないようにしてください。

..
  NOTE: The non-overlap constraint avoids WAR hazards on the
  input vectors during execution, and enables restart with non-zero
  `vstart`.

.. note::

  
========================================
ベクトルスライドダウン命令
========================================

::

   vslidedown.vx vd, vs2, rs1, vm       # vd[i] = vs2[i+rs1]
   vslidedown.vi vd, vs2, uimm, vm      # vd[i] = vs2[i+uimm]
  

..
  For `vslidedown`, the value in `vl` specifies the maximum number of
  destination elements that are written.  The remaining elements past
  `vl` are handled according to the current tail policy (Section
  :ref:`sec-agnostic` ).

`vslidedown`  では、`vl` の値は、書き込まれる宛先要素の最大数を指定します。
`vl` を過ぎた残りの要素は、現在の末尾ポリシ(:ref:`sec-agnostic`  節)に従って処理されます。

..
  The start index (*OFFSET*) for the source can be either specified
  using an unsigned integer in the `x` register specified by `rs1`, or a
  5-bit immediate, zero-extended to XLEN bits.
  If XLEN > SEW, *OFFSET* is *not* truncated to SEW bits.

ソースの開始インデックス (*OFFSET*) は、`rs1` で指定された `x` レジスタの符号なし整数を使用して指定するか、
または XLEN ビットまでゼロ拡張された 5 ビットの即値を使用します。
XLEN > SEWの場合、*OFFSET*はSEWビットに切り捨てられません。

::

    vslidedown behavior for source elements for element i in slide
                     0 <= i+OFFSET < VLMAX   src[i] = vs2[i+OFFSET]
                 VLMAX <= i+OFFSET           src[i] = 0
  
    vslidedown behavior for destination element i in slide
                     0 <  i < vstart         Unchanged
                vstart <= i < vl             vd[i] = src[i] if v0.mask[i] enabled
                    vl <= i < VLMAX          Follow tail policy
  
  
======================
ベクトル Slide1up
======================

..
  Variants of slide are provided that only move by one element but which
  also allow a scalar integer value to be inserted at the vacated
  element position.

スライドのバリエーションとして、要素を1つ移動するだけでなく、
空いた要素の位置にスカラーの整数値を挿入できるものが用意されています。

::

   vslide1up.vx  vd, vs2, rs1, vm        # vd[0]=x[rs1], vd[i+1] = vs2[i]
   vfslide1up.vf vd, vs2, rs1, vm        # vd[0]=f[rs1], vd[i+1] = vs2[i]
  

..
  The `vslide1up` instruction places the `x` register argument at
  location 0 of the destination vector register group, provided that
  element 0 is active, otherwise the destination element update follows the
  current mask agnostic/undisturbed policy.  If XLEN < SEW, the value is
  sign-extended to SEW bits.  If XLEN > SEW, the least-significant bits
  are copied over and the high SEW-XLEN bits are ignored.


`vslide1up` 命令は、要素0がアクティブであれば、`x` レジスタの引数を書き込みベクトルレジスタグループの要素0に配置し、
そうでなければ、書き込みベクトル要素の更新は、現在のMask Agnostic/Undisturbed ポリシに従います。
XLEN < SEWの場合、値はSEWビットに符号拡張されます。
XLEN > SEWの場合、最下位のビットがコピーされ、SEW-XLENの上位ビットは無視されます。

..
  The remaining active `vl`-1 elements are copied over from index *i* in
  the source vector register group to index *i*+1 in the destination
  vector register group.

残りのアクティブな `vl` -1 要素は、ソースベクトルレジスタ群のインデックス *i* から書き込みベクトルレジスタ群のインデックス *i*+1 にコピーオーバーされます。

..
  The `vl` register specifies the maximum number of destination vector
  register elements updated with source values, and remaining elements
  past `vl` are handled according to the current tail policy (Section
  :ref:`sec-agnostic` ).

`vl` レジスタはソースの値で更新される書き込みベクトルレジスタ要素の最大数を指定し、
`vl` を超えた残りの要素は現在の末尾ポリシ(:ref:`sec-agnostic` 節)に従って処理されます。

::

     vslide1up behavior
  
                      i < vstart  unchanged
                  0 = i = vstart  vd[i] = x[rs1] if v0.mask[i] enabled
    max(vstart, 1) <= i < vl      vd[i] = vs2[i-1] if v0.mask[i] enabled
                vl <= i < VLMAX   Follow tail policy
  

..
  The `vslide1up` instruction requires that the destination vector
  register group does not overlap the source vector register group.
  Otherwise, the instruction encoding is reserved.

`vslide1up` 命令は、書き込みベクトルレジスタグループがソースのベクトルレジスタグループと重ならないことが必要です。
それ以外の場合は、命令のエンコーディングは予約されています。

..
  The `vfslide1up` instruction is defined analogously, but sources its
  scalar argument from an `f` register.

`vfslide1up` 命令は類似して定義されていますが、そのスカラ引数を `f` レジスタから供給します。


===============================
ベクトル Slide1down 命令
===============================

..
  The `vslide1down` instruction copies the first `vl`-1 active elements
  values from index *i*+1 in the source vector register group to index
  *i* in the destination vector register group.

`vslide1down` 命令は、最初の `vl`  -1個のアクティブな要素の値を、
ソースベクトルレジスタグループのインデックス *i* +1 から書き込みベクトルレジスタ群のインデックス*i*にコピーします。

..
  The `vl` register specifies the maximum number of destination vector
  register elements written with source values, and remaining elements
  past `vl` are handled according to the current tail policy (Section
  :ref:`sec-agnostic` ).

`vl` レジスタはソースの値で書き込まれる書き込みベクトルレジスタ要素の最大数を指定し、
`vl` を超えた残りの要素は現在の末尾ポリシ(:ref:`sec-agnostic` 節)に従って処理されます。

::

   vslide1down.vx  vd, vs2, rs1, vm      # vd[i] = vs2[i+1], vd[vl-1]=x[rs1]
   vfslide1down.vf vd, vs2, rs1, vm      # vd[i] = vs2[i+1], vd[vl-1]=f[rs1]
  

..
  The `vslide1down` instruction places the `x` register argument at
  location `vl`-1 in the destination vector register, provided that
  element `vl-1` is active, otherwise the destination element is
  unchanged. If XLEN < SEW, the value is sign-extended to SEW bits.  If
  XLEN > SEW, the least-significant bits are copied over and the high
  SEW-XLEN bits are ignored.

`vslide1down` 命令は、要素 `vl-1` がアクティブであれば、`x` レジスタの引数を書き込みベクトルレジスタの位置 `vl`  -1に置き、
そうでなければ、書き込み要素は変更されません。
XLEN < SEWの場合、値はSEWビットに符号拡張されます。
XLEN > SEWの場合、最下位のビットがコピーされ、SEW-XLENの上位ビットは無視されます。

::

     vslide1down behavior
  
                         i < vstart  unchanged
               vstart <= i < vl-1    vd[i] = vs2[i+1] if v0.mask[i] enabled
               vstart <= i = vl-1    vd[vl-1] = x[rs1] if v0.mask[i] enabled
                   vl <= i < VLMAX   Follow tail policy
  

..
  The `vfslide1down` instruction is defined analogously, but sources its
  scalar argument from an `f` register.

`vfslide1down` 命令は同様に定義されていますが、スカラー引数は `f` レジスタから供給されます。


..
  NOTE: The `vslide1down` instruction can be used to load values into a
  vector register without using memory and without disturbing other
  vector registers.  This provides a path for debuggers to modify the
  contents of a vector register, albeit slowly, with multiple repeated
  `vslide1down` invocations.

.. note::

  これは、デバッガが複数回の `vslide1down` の呼び出しを繰り返すことで、ゆっくりではありますが、ベクトルレジスタの内容を変更するためのパスを提供します。
  

*******************************************
ベクトルレジスタギャザー命令
*******************************************

..
  The vector register gather instructions read elements from a first
  source vector register group at locations given by a second source
  vector register group.  The index values in the second vector are
  treated as unsigned integers.  The source vector can be read at any
  index < VLMAX regardless of `vl`.  The maximum number of elements to write to
  the destination register is given by `vl`, and the remaining elements
  past `vl` are handled according to the current tail policy
  (Section :ref:`sec-agnostic` ).  The operation can be masked, and the mask
  undisturbed/agnostic policy is followed for inactive elements.


ベクトル・レジスタ・ギャザー命令は、第2のソースベクトルレジスタグループによって与えられる位置から第1のソースベクトルレジスタグループの要素を読み出します。
第2ベクトルのインデックス値は、符号なし整数として扱われます。
ソースベクトルは、`vl` にかかわらず、任意のインデックス < VLMAX で読み取ることができます。
書き込みレジスタに書き込む最大要素数は `vl` で与えられ、`vl` を過ぎた残りの要素は現在の末尾要素のポリシ (:ref:`sec-agnostic` 節) に従って処理されます。
この操作はマスクすることができ、非アクティブな要素に対しては、マスク undisturbed/agnostic ポリシに従います。


::

  vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];
  vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];
  

..
  The `vrgather.vv` form uses SEW/LMUL for both the data and
  indices. The `vrgatherei16.vv` form uses SEW/LMUL for the data in
  `vs2` but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in `vs1`.

`vrgather.vv` 形式では、データとインデックスの両方に SEW/LMUL を使用しています。
また、`vrgatherei16.vv` 形式では、`vs2` のデータには SEW/LMUL を使用しますが、
`vs1` のインデックスには EEW=16 と EMUL = (16/SEW)*LMUL を使用します。

..
  NOTE: When SEW=8, `vrgather.vv` can only reference vector elements
  0-255.  The `vrgatherei16` form can index 64K elements, and can also
  be used to reduce the register capacity needed to hold indices when
  SEW > 16.

.. note::

  また，`vrgatherei16` 形式では，64K 個の要素のインデックスを作成することができ，
  SEW > 16 の場合にインデックスを保持するために必要なレジスタの容量を減らすために使用することもできます。
  
..
  If an element index is out of range ( `vs1[i]` {ge} VLMAX )
  then zero is returned for the element value.

要素のインデックスが範囲外の場合 ( `vs1[i]` {ge} VLMAX )、要素の値は 0 を返します。

..
  Vector-scalar and vector-immediate forms of the register gather are
  also provided.  These read one element from the source vector at the
  given index, and write this value to the active elements at the start
  of the destination vector register. The index value in the scalar
  register and the immediate, zero-extended to XLEN bits, are treated as
  unsigned integers.  If XLEN > SEW, the index value is *not* truncated
  to SEW bits.

レジスタギャザーのベクトルスカラ形式とベクトル即値形式も用意されています。
これらは、与えられたインデックスでソース・ベクトルから1つの要素を読み、この値を書き込みベクトルレジスタの最初のアクティブの要素に書き込みます。
スカラレジスタのインデックス値と、XLENビットまでゼロ拡張された即値は、符号なし整数として扱われます。
XLEN > SEWの場合，インデックス値はSEWビットまで切り捨てられません。

..
  NOTE: These forms allow any vector element to be "splatted" to an entire vector.

.. note::

  
::

  vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]]
  vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]
  

..
  For any `vrgather` instruction, the destination vector register group
  cannot overlap with the source vector register groups, otherwise the
  instruction encoding is reserved.

任意の `vrgather` 命令では、書き込みベクトルレジスタグループはソースベクトルレジスタグループとオーバーラップすることはできません。


*************************
ベクトル圧縮命令
*************************

..
  The vector compress instruction allows elements selected by a vector
  mask register from a source vector register group to be packed into
  contiguous elements at the start of the destination vector register
  group.

ベクトル圧縮命令は、ソースベクトルレジスタ群からベクトルマスクレジスタによって選択された要素を、
書き込みベクトルレジスタ群の先頭の連続した要素にパックすることができます。

..
  ----
    vcompress.vm vd, vs2, vs1  # Compress into vd elements of vs2 where vs1 is enabled
  ----

::

    vcompress.vm vd, vs2, vs1  # vs1で有効なvs2の要素をvdに圧縮する
  

..
  The vector mask register specified by `vs1` indicates which of the
  first `vl` elements of vector register group `vs2` should be extracted
  and packed into contiguous elements at the beginning of vector
  register `vd`. The remaining elements of `vd` are treated as tail
  elements according to the current tail policy (Section
  :ref:`sec-agnostic` ).

`vs1` で指定されたベクトルマスクレジスタは、ベクトルレジスタグループ `vs2` の最初の `vl` 要素のうち、
どの要素を抽出して、ベクトルレジスタ `vd` の先頭の連続した要素にパックすべきかを示します。
`vd` の残りの要素は、現在の末尾要素ポリシ (:ref:`sec-agnostic` 節) に従って、末尾要素として扱われます。

..
  ----
      Example use of vcompress instruction
  
          1 1 0 1 0 0 1 0 1   v0
          8 7 6 5 4 3 2 1 0   v1
          1 2 3 4 5 6 7 8 9   v2
  
                                  vcompress.vm v2, v1, v0
          1 2 3 4 8 7 5 2 0   v2
  ----

::

      vcompress命令の使用例
  
          1 1 0 1 0 0 1 0 1   v0
          8 7 6 5 4 3 2 1 0   v1
          1 2 3 4 5 6 7 8 9   v2
  
                                  vcompress.vm v2, v1, v0
          1 2 3 4 8 7 5 2 0   v2
  

..
  `vcompress` is encoded as an unmasked instruction (`vm=1`). The equivalent
  masked instruction (`vm=0`) is reserved.

`vcompress` は、マスクされていない命令(`vm=1`)としてエンコードされます。
同等のマスクされた命令(`vm=0`)は予約されています。

..
  The destination vector register group cannot overlap the source vector
  register group or the source mask register, otherwise the instruction
  encoding is reserved.

書き込みベクトルレジスタグループはソースベクトルレジスタグループやソースマスクレジスタと重なることはできません。

..
  A trap on a `vcompress` instruction is always reported with a
  `vstart` of 0.  Executing a `vcompress` instruction with a non-zero
  `vstart` raises an illegal instruction exception.

非ゼロの `vstart` で `vcompress` 命令を実行すると、不正命令例外が発生します。

..
  NOTE: Although possible, `vcompress` is one of the more difficult
  instructions to restart with a non-zero `vstart`, so assumption is
  implementations will choose not do that but will instead restart from
  element 0.  This does mean elements in destination register after
  `vstart` will already have been updated.

.. note::

  実装ではこれを行わずに要素 0 から再起動することを想定しています。
  

========================
`vdecompress` の合成
========================

..
  There is no inverse `vdecompress` provided, as this operation can be
  readily synthesized using iota and a masked vrgather:

逆の操作を行う `vdecompress` 命令は定義されていませんが、この操作はiota命令とマスク付き `vrgather` 命令を用いて合成できます。

..
  ----
      Desired functionality of 'vdecompress'
        7 6 5 4 3 2 1 0     # vid
  
              e d c b a     # packed vector of 5 elements
        1 0 0 1 1 1 0 1     # mask vector of 8 elements
        p q r s t u v w     # destination register before vdecompress
  
        e q r d c b v a     # result of vdecompress
  ----

::

      `vdecompress` の所望の動作
        7 6 5 4 3 2 1 0     # vid
  
              e d c b a     # パッキングされたベクトルの5要素
        1 0 0 1 1 1 0 1     # 8要素のベクトルマスク
        p q r s t u v w     # vdecompress実行前の書き込みレジスタ
  
        e q r d c b v a     # vdecompress実行後
  

..
  ----
       # v0 holds mask
       # v1 holds packed data
       # v11 holds input expanded vector and result
       viota.m v10, v0                 # Calc iota from mask in v0
       vrgather.vv v11, v1, v10, v0.t  # Expand into destination
  ----
  ----
     p q r s t u v w    # v11 destination register
           e d c b a    # v1 source vector
     1 0 0 1 1 1 0 1    # v0 mask vector
  
     4 4 4 3 2 1 1 0    # v10 result of viota.m
     e q r d c b v a    # v11 destination after vrgather using viota.m under mask
  ----

::

       # v0はマスクを保持している
       # v1はパックデータを保持している
       # v11は展開されたベクトルと結果を保持している
       viota.m v10, v0                 # v0のマスクを使用してiotaを実行する
       vrgather.vv v11, v1, v10, v0.t  # 書き込みレジスタを展開する
  
::

     p q r s t u v w    # v11書き込みレジスタ
           e d c b a    # v1ソースレジスタ
     1 0 0 1 1 1 0 1    # v0マスクレジスタ
  
     4 4 4 3 2 1 1 0    # viota.m実行後のv10レジスタ
     e q r d c b v a    # マスク付きviota.mを用いたvrgatherの書き込みレジスタ
  

*******************************************
ベクトルレジスタ全体移動命令
*******************************************

..
  The `vmv<nr>r.v` instructions copy whole vector registers (i.e., all
  VLEN bits) and can copy whole vector register groups.  The
  instructions operate as if EEW=SEW, EMUL = `nr`, effective length
  `evl`= EMUL * VLEN/SEW.

`vmv<nr>r.v` 命令は、ベクトルレジスタ全体(VLENビット全て)をコピーし、
ベクトルレジスタグループ全体をコピーすることができます。
EEW=SEW、EMUL=`nr` 、実効長 `evl` =EMUL * VLEN/SEWのように動作します。

..
  NOTE: These instructions are intended to aid compilers to shuffle
  vector registers without needing to know or change `vl` or `vtype`.

.. note::

  ベクトルレジスタをシャッフルすることを支援するためのものです。
  
..
  NOTE: The usual property that no elements are written if `vstart` {ge} `vl`
  does not apply to these instructions.
  Instead, no elements are written if `vstart` {ge} `evl`.

.. note::

  代わりに、`vstart` {ge} `evl` の場合には要素は書き込まれません。
  
..
  NOTE: If `vd` is equal to `vs2` the instruction is an architectural
  NOP, but is treated as a hint to implementations that rearrange data
  internally that the register group will next be accessed with an EEW
  equal to SEW.

.. note::

  そのレジスタグループが次に SEW に等しい EEW でアクセスされることを示すヒントとして扱われます。
  
..
  The instruction is encoded as an OPIVI instruction.  The number of
  vector registers to copy is encoded in the low three bits of the
  `simm` field using the same encoding as the `nf` field for memory
  instructions, i.e., `simm` = `nr-1`.
  The value of the `nr` field must be 1, 2, 4, or 8, with other values reserved.

この命令はOPIVI命令としてエンコードされます。 コピーするベクトルレジスタの数は、
メモリ命令の `nf` フィールドと同じエンコーディングで `simm` フィールドの下位3ビットにエンコードされます。
`nr` フィールドの値は、1、2、4、8のいずれかでなければならず、その他の値は予約されています。

..
  NOTE: A future extension may support other numbers of registers to be moved.
  Values of `simm` other than 0, 1, 3, and 7 are currently reserved.

.. note::

  
..
  NOTE: The instruction uses the same funct6 encoding as the `vsmul`
  instruction but with an immediate operand, and only the unmasked
  version (`vm=1`).  This encoding is chosen as it is close to the
  related `vmerge` encoding, and it is unlikely the `vsmul` instruction
  would benefit from an immediate form.

.. note::

  このエンコーディングが選ばれたのは、関連する `vmerge` エンコーディングに近いことと、`vsmul` 命令が即値フォームから恩恵を受ける可能性が低いことによります。
  
..
  ----
      vmv<nr>r.v vd, vs2  # General form
  
      vmv1r.v v1, v2   #  Copy v1=v2
      vmv2r.v v10, v12 #  Copy v10=v12; v11=v13
      vmv4r.v v4, v8   #  Copy v4=v8; v5=v9; v6=v10; v7=v11
      vmv8r.v v0, v8   #  Copy v0=v8; v1=v9; ...;  v7=v15
  ----


::

      vmv<nr>r.v vd, vs2  # 一般的なフォーム
  
      vmv1r.v v1, v2   #  v2をv1にコピーする
      vmv2r.v v10, v12 #  v12をv10にコピーし、v13をv11にコピーする
      vmv4r.v v4, v8   #  v8をv4に、v9をv5に、v10をv6に、v11をv7にコピーする
      vmv8r.v v0, v8   #  v8をv0に、v9をv1に、... v15をv7にコピーする
  

..
  The source and destination vector register numbers must be aligned
  appropriately for the vector register group size, and encodings with
  other vector register numbers are reserved.

ソースと書き込みのベクトルレジスタ番号は、ベクトルレジスタグループのサイズに合わせて適切にアラインメントする必要があり、
他のベクトルレジスタ番号とのエンコーディングは予約されています。

..
  NOTE: A future extension may relax the vector register alignment
  restrictions.

.. note::

  
#############
例外処理
#############

..
  On a trap during a vector instruction (caused by either a synchronous
  exception or an asynchronous interrupt), the existing `*epc` CSR is
  written with a pointer to the errant vector instruction, while the
  `vstart` CSR contains the element index that caused the trap to be
  taken.

ベクトル命令中の例外(同期例外または非同期割込みのいずれかが原因)では、
既存の `*epc`  CSR には例外の発生したベクトル命令へのポインタが書き込まれ、
`vstart`  CSR には例外の原因となった要素のインデックスが書き込まれます。

..
  NOTE: We chose to add a `vstart` CSR to allow resumption of a
  partially executed vector instruction to reduce interrupt latencies
  and to simplify forward-progress guarantees.  This is similar to the
  scheme in the IBM 3090 vector facility.  To ensure forward progress
  without the `vstart` CSR, implementations would have to guarantee an
  entire vector instruction can always complete atomically without
  generating a trap.  This is particularly difficult to ensure in the
  presence of strided or scatter/gather operations and demand-paged
  virtual memory.

.. note::

  これは、IBM 3090のベクトル機能のスキームに似ています。
  `vstart`  CSRなしで前進性を確保するためには、実装は、ベクトル命令全体が例外を発生させることなく常にアトミックに完了することを保証しなければなりません。
  これは、ストライド演算やスキャッタ/ギャザ演算、デマンドページングされた仮想メモリがある場合には、特に困難です。
  

****************************
正確なベクトル例外
****************************

..
  NOTE: We assume most supervisor-mode environments with demand-paging
  will require precise vector traps.

.. note::

  
..
  Precise vector traps require that:
  
  * all instructions older than the trapping vector instruction have committed their results
  * no instructions newer than the trapping vector instruction have altered architectural state
  * any operations within the trapping vector instruction affecting result elements preceding the index in the `vstart` CSR have committed their results
  * no operations within the trapping vector instruction affecting elements at or following the `vstart` CSR have altered architectural state except if restarting and completing the affected vector instruction will nevertheless produce the correct final state.


正確なベクトル例外には以下の要件が必要です:

* 例外の発生したベクトル命令よりも古い命令が結果をコミットしていること
* 例外の発生したベクトル命令よりも新しい命令がアーキテクチャ状態を変更していないこと
* 例外の発生したベクトル命令内で、 `vstart`  CSR のインデックスより前の結果要素に影響を与える演算が結果をコミットしていること
* 例外の発生したベクトル命令内で、 `vstart`  CSR 以降の要素に影響を与える演算がアーキテクチャ状態を変更していないこと。ただし、影響を受けるベクトル命令を再起動して完了させても、正しい最終状態が得られる場合を除く。

..
  We relax the last requirement to allow elements following `vstart` to
  have been updated at the time the trap is reported, provided that
  re-executing the instruction from the given `vstart` will correctly
  overwrite those elements.

最後の要件を緩和して、例外が報告された時点で `vstart`  以降の要素が更新されていても、
与えられた `vstart`  から命令を再実行することでそれらの要素が正しく上書きされることを許可します。

..
  In idempotent memory regions, vector store instructions may have
  updated elements in memory past the element causing a synchronous
  trap.  Non-idempotent memory regions must not have been updated for
  indices equal to or greater than the element that caused a synchronous
  trap during a vector store instruction.

非デバイスメモリ領域では、ベクトルストア命令が、同期例外の原因となった要素を超えて、メモリ内の要素を更新している可能性があります。
デバイスメモリ領域では、ベクトルストア命令中に同期例外を引き起こした要素と同じかそれ以上のインデックスで更新されていてはなりません。

..
  Except where noted above, vector instructions are allowed to overwrite
  their inputs, and so in most cases, the vector instruction restart
  must be from the `vstart` location. However, there are a number of
  cases where this overwrite is prohibited to enable execution of the
  vector instructions to be idempotent and hence restartable from any
  location.

上述の場合を除き、ベクトル命令は入力を上書きすることができますので、ほとんどの場合、
ベクトル命令の再起動は `vstart`  の位置からでなければなりません。
しかし、ベクトル命令の実行結果が常に同一であり、どの場所からでも再起動できるようにするために、
この上書きを禁止するケースがいくつかあります。

..
  Implementations must ensure forward progress can be eventually
  guaranteed for the element or segment reported by `vstart`.

実装では、`vstart` で報告された要素やセグメントについて、最終的に前進が保証されるようにしなければなりません。


****************************
不正確ベクトル例外
****************************

..
  Imprecise vector traps are traps that are not precise.  In particular,
  instructions newer than `*epc` may have committed results, and
  instructions older than `*epc` may have not completed execution.
  Imprecise traps are primarily intended to be used in situations where
  reporting an error and terminating execution is the appropriate
  response.

不正確なベクトル例外とは、正確ではない例外のことです。
特に、 `*epc` よりも新しい命令は結果をコミットしているかもしれませんし、 `*epc` よりも古い命令は実行を完了していないかもしれません。
不正確な例外は主に、エラーを報告して実行を終了することが適切な反応である場合に使用することを意図しています。

..
  NOTE: A profile might specify that interrupts are precise while other
  traps are imprecise.  We assume many embedded implementations will
  generate only imprecise traps for vector instructions on fatal errors,
  as they will not require resumable traps.

.. note::

  多くの組み込み機器では、致命的なエラーが発生したときには、再開可能な例外を必要としないため、
  ベクトル命令のための不正確な例外のみを生成することを想定しています。
  
..
  Imprecise traps shall report the faulting element in `vstart` for
  traps caused by synchronous vector exceptions.

不正確な例外は、同期ベクトル例外によって引き起こされた例外のために、`vstart` で障害要素を報告しなければなりません。


****************************************
正確・不正確選択可能な例外
****************************************

..
  Some profiles may choose to provide a privileged mode bit to select
  between precise and imprecise vector traps.  Imprecise mode would run
  at high-performance but possibly make it difficult to discern error
  causes, while precise mode would run more slowly, but support
  debugging of errors albeit with a possibility of not experiencing the
  same errors as in imprecise mode.

プロファイルによっては、特権モードのビットを用意して、正確なベクトル例外と不正確ベクトル例外を選択することができます。
不正確なモードでは、高性能に動作しますが、エラーの原因を特定することが困難になる可能性があります。
一方、正確なモードでは、動作速度が遅くなりますが、不正確なモードと同じエラーが発生しない可能性があるものの、
エラーのデバッグが行いやすくなります。


****************************
スワップ可能な例外
****************************

..
  Another trap mode can support swappable state in the vector unit,
  where on a trap, special instructions can save and restore the vector
  unit microarchitectural state, to allow execution to continue
  correctly around imprecise traps.

別の例外モードでは、ベクトルユニットのスワップ可能な状態をサポートし、
例外時に特別な命令でベクトルユニットのマイクロアーキテクチャの状態を保存・復元することで、
不正確な例外の周辺でも実行を正しく継続できるようにします。

..
  This mechanism is not defined in the current standard extensions.

このメカニズムは、現在の標準拡張では定義されていません。

..
  NOTE: A future extension might define a standard way of saving and
  restoring opaque microarchitectural state from a vector unit
  implementation to support context switching with imprecise traps.

.. note::

  ベクトルユニットの実装から不透明なマイクロアーキテクチャの状態を保存・復元する標準的な方法を定義する可能性があります。
  
.. _sec-vector-extensions:

#########################
標準ベクトル拡張
#########################

..
  This section describes the standard vector extensions to be proposed
  for public review.  A set of smaller extensions intended for embedded
  use are named with a "Zve" prefix, while a larger vector extension
  designed for application processors is named as a single-letter V
  extension.

このセクションでは、パブリックレビューのために提案される標準ベクトル拡張について説明します。
組み込み用途を目的とした小規模な拡張機能のセットには "Zve "という接頭辞を付け、
アプリケーションプロセッサ向けに設計された大規模なベクター拡張機能には1文字のVという拡張子を付けています。

..
  The initial vector extensions are designed to act as a base for
  additional vector extensions in various domains, including
  cryptography and machine learning.

初期のベクトル拡張は、暗号や機械学習などの様々な領域で追加のベクトル拡張を行うためのベースとして設計されています。


**********************************************************
Zve*: 組み込みプロセッサ用のベクトル拡張
**********************************************************

..
  The following five standard extensions are defined to provide varying
  degrees of vector support and are intended for use with embedded
  processors.  Any of these extensions can be added to base ISAs with
  XLEN=32 or XLEN=64.  The table lists the minimum VLEN and supported
  EEWs for each extension as well as what floating-point types are
  supported.

以下の5つの標準的な拡張機能は、さまざまな程度のベクトルサポートを提供するために定義されており、
組み込みプロセッサでの使用を目的としています。
これらの拡張機能は、XLEN=32 または XLEN=64 のベース ISA に追加することができます。
表には、各拡張機能の最小VLENとサポートされるEEW、およびサポートされる浮動小数点型を示しています。

..
  .Embedded vector extensions
  [cols="1,1,2,1,1"]
  [%autowidth]
  |===
  | Extension | Minimum VLEN | Supported EEW |  FP32 | FP64
  
  | Zve32x    | 32    | 8, 16, 32     |   N   |  N
  | Zve32f    | 32    | 8, 16, 32     |   Y   |  N
  | Zve64x    | 64    | 8, 16, 32, 64 |   N   |  N
  | Zve64f    | 64    | 8, 16, 32, 64 |   Y   |  N
  | Zve64d    | 64    | 8, 16, 32, 64 |   Y   |  Y
  |===


..


..
  All Zve* extensions have precise traps.

全ての Zve* 拡張は正確な例外を持ちます。

..
  NOTE: There is currently no standard support for handling imprecise
  traps, so standard extensions have to provide precise traps.

.. note::

  
..
  All Zve* extensions provide support for EEW of 8, 16, and 32, and
  Zve64* extensions also support EEW of 64.

すべてのZve*拡張は、8,16,32のEEWをサポートしており、Zve64*拡張は64のEEWもサポートしています。

..
  All Zve* extensions support the vector configuration instructions
  (Section :ref:`sec-vector-config` ).

すべての Zve* 拡張機能は、ベクトル構成命令 (:ref:`sec-vector-config`  節) をサポートしています。

..
  All Zve* extensions support all vector load and store instructions
  (Section :ref:`sec-vector-memory` ), except Zve64* extensions do not
  support EEW=64 for index values when XLEN=32.

全ての Zve* 拡張は全てのベクトルロード・ストア命令(:ref:`sec-vector-memory` 節)をサポートしていますが、
Zve64* 拡張は XLEN=32 の時のインデックス値に対する EEW=64 をサポートしていません。

..
  All Zve* extensions support all vector integer instructions (Section
  :ref:`sec-vector-integer` ), except that the `vmulh` integer multiply
  variants that return the high word of the product (`vmulh.vv`,
  `vmulh.vx`, `vmulhu.vv`, `vmulhu.vx`, `vmulhsu.vv`, `vmulhsu.vx`) are
  not included for EEW=64 in Zve64*.

すべての Zve* 拡張は、すべてのベクトル整数命令 (:ref:`sec-vector-integer` 節) をサポートしています。
ただし、積の上位ワードを返す `vmulh`  整数乗算バリエーション (`vmulh.vv`, `vmulh.vx`, `vmulhu.vv`, `vmulhu.vx`, `vmulhsu.vv`, `vmulhsu.vx` ) は、Zve64* では EEW=64 に対応していません。

..
  NOTE: Producing the high-word of a product can take substantial
  additional gates for large EEW.

.. note::

  
..
  All Zve* extensions support all vector fixed-point arithmetic
  instructions (:ref:`sec-vector-fixed-point` ), except that `vsmul.vv` and
  `vsmul.vx` are not supported for EEW=64 in Zve64*.

すべてのZve*拡張は、すべてのベクトル固定小数点演算命令（:ref:`sec-vector-fixed-point` ）をサポートしています。
ただし、 `vsmul.vv` と `vsmul.vx` は、Zve64*のEEW=64ではサポートされていません。

..
  NOTE: As with `vmulh`, `vsmul` requires a large amount of additional
  logic, and 64-bit fixed-point multiplies are relatively rare.

.. note::

  
..
  All Zve* extensions support all vector integer single-width and
  widening reduction operations (Sections :ref:`sec-vector-integer-reduce` ,
  :ref:`sec-vector-integer-reduce-widen` ).

すべての Zve* 拡張は、すべてのベクトル整数の単一幅および幅拡張および幅縮小操作をサポートします (:ref:`sec-vector-integer-reduce` 節, :ref:`sec-vector-integer-reduce-widen` 節)。

..
  All Zve* extensions support all vector mask instructions (Section
  :ref:`sec-vector-mask` ).

全ての Zve* 拡張は全てのベクトルマスク命令をサポートします (:ref:`sec-vector-mask` 節)。

..
  All Zve* extensions support all vector permutation instructions
  (Section :ref:`sec-vector-permute` ), except that Zve32x and Zve64x do not
  implement the floating-point scalar move instructions.

Zve32x と Zve64x が浮動小数点スカラ移動命令を実装していないことを除いて、
すべての Zve* 拡張はすべてのベクトル組み合わせ命令 (:ref:`sec-vector-permute` 節) をサポートしています。

..
  The Zve32f and Zve64f extensions require the scalar processor to
  implement the F extension, and implement all vector floating-point
  instructions (Section :ref:`sec-vector-float` ) for floating-point
  operands with EEW=32 (i.e., no widening floating-point operations),
  and conversion instructions are provided to and from all supported
  integer EEWs.  Vector single-width floating-point reduction operations
  (:ref:`sec-vector-float-reduce` ) for EEW=32 are supported.

Zve32fおよびZve64f拡張は、スカラ・プロセッサがF拡張を実装し、EEW=32の浮動小数点オペランドに対するすべてのベクトル浮動小数点命令（:ref:`sec-vector-float` 節）を実装することを要求し、
サポートされているすべての整数EEWとの間の変換命令が提供されています。
EEW=32のベクトル単幅浮動小数点演算(:ref:`sec-vector-float-reduce` 節)をサポートしています。

..
  The Zve32d and Zve64d extensions require the scalar processor to
  implement the D extension, and implement all vector floating-point
  instructions (Section :ref:`sec-vector-float` ) for floating-point
  operands with EEW=32 or EEW=64 (including widening instructions and
  conversions between FP32 and FP64). Vector single-width floating-point
  reductions (:ref:`sec-vector-float-reduce` ) for EEW=32 and EEW=64 are
  supported as well as widening reductions from FP32 to FP64.

Zve32d および Zve64d 拡張は、スカラ・プロセッサが D 拡張を実装し、EEW=32 または EEW=64 の浮動小数点オペランドに対するすべてのベクトル浮動小数点命令 (:ref:`sec-vector-float` 節) を実装する必要があります
(幅拡張命令および FP32 と FP64 の間の変換を含む)。
EEW=32およびEEW=64のベクトル単幅浮動小数点演算(:ref:`sec-vector-float-reduce` 節)がサポートされており、
FP32からFP64への幅拡張リダクションもサポートされています。


*************************************************************************
V: アプリケーションプロセッサのためのベクトル拡張
*************************************************************************

..
  The single-letter V extension is intended for use in application
  processor profiles.

1文字のV拡張は、アプリケーションプロセッサのプロファイルでの使用を目的としています。

..
  The V vector extension has precise traps.

V ベクトル拡張には正確なトラップをサポートします。

..
  The V vector extension requires that VLEN {ge} 128.

V ベクトル拡張は、VLEN {ge} 128 が必要です。

..
  NOTE: The value of 128 was chosen as a compromise for application
  processors. Providing a larger VLEN allows stripmining code to be
  elided in some cases for short vectors, but also increases the size of
  the minimum implementation.  Note that larger LMUL can be used to
  avoid stripmining for longer known-size application vectors at the
  cost of having fewer available vector register groups. For example, an
  LMUL of 8 allows vectors of up to sixteen 64-bit elements to be
  processed without stripmining using four vector register groups.

.. note::

  VLEN を大きくすると、短いベクターの場合にストリップマイニングコードを省略できる場合がありますが、
  最小実装のサイズが大きくなります。
  LMULを大きくすると、既知のサイズの長いアプリケーションベクターのストリップマイニングを回避することができますが、
  利用可能なベクトルレジスタグループの数が少なくなりますので注意してください。
  たとえば、LMULが8の場合、最大16個の64ビット要素を持つベクトルを、4つのベクトルレジスターグループを使用して、
  ストリップマイニングを行わずに処理することができます。
  
..
  The V extension supports EEW of 8, 16, and 32, and 64.

V拡張は、8、16、32、および64のEEWをサポートしています。

..
  The V extension supports the vector configuration instructions
  (Section :ref:`sec-vector-config` ).

V拡張は，ベクトルコンフィグレーション設定命令(vsetvli/vsetivl/vsetvl)をサポートしています。

..
  The V extension supports all vector load and store instructions
  (Section :ref:`sec-vector-memory` ), except the V extension does not
  support EEW=64 for index values when XLEN=32.

V拡張は全てのロードストア命令をサポートしていますが(:ref:`sec-vector-memory` 節)
XLEN=32の時のインデックス値に関するEEW=64のV拡張はサポートされていません。

..
  The V extension supports all vector integer instructions (Section
  :ref:`sec-vector-integer` ).

V拡張機能は、すべてのベクトル整数命令 (:ref:`sec-vector-integer` 節) をサポートしています。

..
  The V extension supports all vector fixed-point arithmetic
  instructions (:ref:`sec-vector-fixed-point` ).

V 拡張機能は、すべてのベクトル固定小数点演算命令 (:ref:`sec-vector-integer` 節) をサポートしています。

..
  The V extension supports all vector integer single-width and
  widening reduction operations (Sections :ref:`sec-vector-integer-reduce` ,
  :ref:`sec-vector-integer-reduce-widen` ).

V拡張は、すべてのベクトル整数の単一幅および幅縮小演算をサポートしています(:ref:`sec-vector-integer-reduce` ,
:ref:`sec-vector-integer-reduce-widen` 節)。

..
  The V extension supports all vector mask instructions (Section
  :ref:`sec-vector-mask` ).

V 拡張機能は、すべてのベクトルマスク命令をサポートします (:ref:`sec-vector-mask` 節)。

..
  The V extension supports all vector permutation instructions (Section
  :ref:`sec-vector-permute` ).

V拡張は、すべてのベクトル組み合わせ命令をサポートしています(:ref:`sec-vector-permute` 節)。

..
  The V extension requires the scalar processor to implement the F and D
  extensions, and implements all vector floating-point instructions
  (Section :ref:`sec-vector-float` ) for floating-point operands with EEW=32
  or EEW=64 (including widening instructions and conversions between
  FP32 and FP64). Vector single-width floating-point reductions
  (:ref:`sec-vector-float-reduce` ) for EEW=32 and EEW=64 are supported as
  well as widening reductions from FP32 to FP64.

また、EEW=32 または EEW=64 の浮動小数点オペランドに対するすべてのベクトル浮動小数点命令 (:ref:`sec-vector-float` 節) を実装します (幅拡張命令およびFP32とFP64 間の変換を含む)。
EEW=32およびEEW=64のベクトル単一幅浮動小数点リダクション命令(:ref:`sec-vector-float-reduce` ) は、FP32からFP64への幅拡張リダクションと同様にサポートされています。


############################
ベクトル命令リスト
############################

include::inst-table.adoc[]

include::vector-examples.adoc[]

include::calling-convention.adoc[]