句向量计算

澜舟句向量计算是基于澜舟自研的孟子轻量化技术体系打造,输出句子级文本的向量表示,能够捕捉文本间的语义相似性。适合完成聚类、回归、异常检测、可视化等任务。

请求地址#

发送 POST 请求到:

https://open.langboat.com

通信协议#

澜舟科技句向量计算服务所提供的 API 接口均通过 HTTPS 进行通信,提供高安全性的通信通道。

调用量统计#

按调用成功次数统计调用量。

请求参数#

请求头


请求头参数 是否必须 描述
Accept 固定值 application/json
Authorization 用于验证请求合法性的认证信息,值为 AccessKey:signature格式。
Content-Type 固定值 application/json
Content-MD5 HTTP 协议消息体的 128-bit MD5 散列值转换成 Base64 编码的结果。
Date 请求时间,GMT 格式,如: Wed, 20 Apr. 2022 17:01:00 GMT。
x-langboat-signature-nonce 唯一随机数,用于防止网络重放攻击。在不同请求中要使用不同的随机数值。
x-langboat-signature-method 签名方法,目前只支持 HMAC-SHA256
签名计算方法
  1. 计算 body 的 MD5 值,然后转换为十六进制编码,编码后的值设置到 Content-MD5 Header

计算示例:

  • body = "" (空字符)
  • MD5(body) = byte(d41d8cd98f00b204e9800998ecf8427e)
  • Base64(MD5(body)) = 1B2M2Y8AsgTpgAmY7PhCfg==

Content-MD5 Header: Content-MD5: 1B2M2Y8AsgTpgAmY7PhCfg==

  1. 使用请求中的 Header 参数构造待签名的 HeaderToSign:
StringToSign =
HTTP-Verb + "\n" + //HTTP-Verb只支持POST
Accept + “\n” + //Accept为application/json
Content-MD5 + "\n" + //第1步中计算出来的MD5值
Content-Type + "\n" + //Content-Type值为application/json
Date + "\n" + //Date值为GMT时间
x-langboat-signature-method + "\n" + // 只支持 HMAC-SHA256
x-langboat-signature-nonce + "\n";

StringToSign 示例:

POST
application/json
1B2M2Y8AsgTpgAmY7PhCfg==
application/json
Wed, 20 Jul 2022 13:04:02 GMT
HMAC-SHA256
10191
  1. 使用请求中的 Queries 构造待签名的 queryToSign。将query字符串(?后的所有参数)根据字典排序升序排列并以&分隔,示例如下:
action=embedSentences&sentences={"data":["道可道非常道"]}
  1. 将上两步构造的规范化字符串按照下面的规则构造成待签名的字符串。
stringToSign = headerStringToSign + queryToSign;
  1. 计算签名 signature。按照 RFC2104的定义,计算待签名字符串 stringToSign 的 HMAC 值,按照 Base64 编码规则把 HMAC 值编码成字符串,并在前面加上 AccessKey ,即得到签名值(Authorization),示例如下:
Signature = Base64(HMAC-SHA256( AccessSecret, UTF-8-Encoding-Of(stringToSign)))
Authorization = AccessKey + ":" + Signature

Signature 示例:po/vsPI0RcvY/eu4bohhxxADHyPj4/rcglLTQEBtHQM=

Authorization 示例: Authorization: 7Bo9ByyiTWRC1Y8KJJQ9cWtNpZLmrgyb:po/vsPI0RcvY/eu4bohhxxADHyPj4/rcglLTQEBtHQM=

Query 参数


Query参数 是否必须 描述
action 固定值:embedSentences
sentences 必填,待编码句子列表。个数限制:[1,5]; 单句长度限制[1, 512]; 以JSON字符串的形式传递句子列表,例如:{"data":["句子1", "句子2"]}

请求示例:

POST https://open.langboat.com/?action=embedSentences
&sentences={"data":["道可道非常道"]}
Content-Type: application/json
Content-MD5: TgEoPbXNLtJLSucCVfFsYQ
Date: Tue, 19 Apr 2022 10:03:46 GMT
Accept: application/json
x-langboat-signature-method: HMAC-SHA256
x-langboat-signature-nonce: 43785
Authorization: aAM9NHSkWY40wkd7EUg6HFuFzJpJPG6E:ZoRXXiZscO/dCeDdPC1mrQ++8kQDKwJZmqFa3pby+mc=

响应#

响应体

响应体是一个 JSON , 当code为0时(返回成功),data字段的 embeddings 值即为输入句子列表对应的向量列表,向量维度为 1024。结构示例:

{
"code": 0,
"message": "success",
"requestId": "e317e29e68c1f86de3beca6620377c27",
"data": {
"embeddings": [
[
-0.124582164,
-0.07033879,
-0.4081061,
-0.09000866,
...
]
]
}
}

响应状态码、业务编码

下面是可能的HTTP响应状态码、业务(错误)编码:

HTTP 状态码 业务编码 描述
200 0 返回成功
400 10400 请求异常
401 10401 鉴权失败,核对apiKey和apiSecret 是否正确
403 10403 权限不足,查看是否开通服务
422 10422 参数错误,核对请求参数
429 10429 超过请求限制(QPS,字符数,次数超过限制)
500 10500 服务异常

错误响应示例

  • 鉴权失败
{
"code": 10401,
"message": "鉴权失败,核对apiKey和apiSecret 是否正确",
"requestId": "33f0057e-f421-fb94-766f-608d837969ca"
}
  • 参数错误:不支持的 action
{
"code": 10422,
"message": "参数错误,核对请求参数, 不支持的action : generateTemplate",
"requestId": "6cc3f2a9-5fd1-4872-9c59-d6f10ceb1e62"
}
  • 权限不足
{
"code": 10403,
"message": "权限不足,查看是否开通服务",
"requestId": "6aa60868-d3f1-a7e5-8fbb-e2f96d65e7b5"
}

示例代码#

Java (JDK-11)

import cn.hutool.core.util.URLUtil;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.security.MessageDigest;
import java.text.SimpleDateFormat;
import java.util.*;
public class LangboatOpenClient {
private final String accessKey;
private final String accessSecret;
private final String url;
public LangboatOpenClient(String accessKey, String accessSecret) {
this.accessKey = accessKey;
this.accessSecret = accessSecret;
this.url = "https://open.langboat.com";
}
public LangboatOpenClient(String accessKey, String accessSecret, String url) {
this.accessKey = accessKey;
this.accessSecret = accessSecret;
this.url = url;
}
/*
* 计算 MD5 + Base64
*/
private String MD5Base64(String s) {
if (s == null)
return null;
String encodeStr = "";
byte[] utfBytes = s.getBytes();
MessageDigest mdTemp;
try {
mdTemp = MessageDigest.getInstance("MD5");
mdTemp.update(utfBytes);
byte[] md5Bytes = mdTemp.digest();
encodeStr = Base64.getEncoder().encodeToString(md5Bytes);
} catch (Exception e) {
throw new Error("Failed to generate MD5 : " + e.getMessage());
}
return encodeStr;
}
/*
* 计算 HMAC-SHA256 + Base64 编码
*/
private String HMACSha256Base64(String data, String key) {
String result;
try {
SecretKeySpec signingKey = new SecretKeySpec(key.getBytes(), "HmacSHA256");
Mac mac = Mac.getInstance("HmacSHA256");
mac.init(signingKey);
byte[] rawHmac = mac.doFinal(data.getBytes());
result = Base64.getEncoder().encodeToString(rawHmac);
} catch (Exception e) {
throw new Error("Failed to generate HMAC : " + e.getMessage());
}
return result;
}
/*
* 获取时间
*/
private String toGMTString(Date date) {
SimpleDateFormat df = new SimpleDateFormat("E, dd MMM yyyy HH:mm:ss z", Locale.CHINA);
df.setTimeZone(new java.util.SimpleTimeZone(0, "GMT"));
return df.format(date);
}
public Object inference(Map<String, String> queries) {
PrintWriter out = null;
BufferedReader in = null;
StringBuilder result = new StringBuilder();
try {
StringBuilder queriesStr = new StringBuilder();
queries.forEach((k, v) -> queriesStr.append("&").append(k).append("=").append(URLUtil.encode(v)));
queriesStr.setCharAt(0, '?');
URL openUrl = new URL(this.url +queriesStr);
String body = "";
String method = "POST";
String accept = "application/json";
String contentType = "application/json";
String date = toGMTString(new Date());
// 1.对body做MD5+BASE64加密
String bodyMd5 = MD5Base64(body);
String nonce = "" + (int) (Math.random() * 65535);
String headerToSign = method + "\n" + accept + "\n" + bodyMd5 + "\n"
+ contentType + "\n" + date + "\n"
+ "HMAC-SHA256\n"
+ nonce + "\n";
// 2.计算 queryToSign
List<String> queriesList = new ArrayList<>();
queries.forEach((k, v) -> queriesList.add(k + "=" + v));
Collections.sort(queriesList);
String queryToSign = String.join("&", queriesList);
// 3.计算 stringToSign
String stringToSign = headerToSign + queryToSign;
// 4.计算 HMAC-SHA256 + Base64
String signature = HMACSha256Base64(stringToSign, this.accessSecret);
// 5.得到 authorization header 值
String authorization = this.accessKey + ":" + signature;
URLConnection conn = openUrl.openConnection();
conn.setRequestProperty("Accept", accept);
conn.setRequestProperty("Content-Type", contentType);
conn.setRequestProperty("Content-MD5", bodyMd5);
conn.setRequestProperty("Date", date);
conn.setRequestProperty("Authorization", authorization);
conn.setRequestProperty("x-langboat-signature-nonce", nonce);
conn.setRequestProperty("x-langboat-signature-method", "HMAC-SHA256");
// POST
conn.setDoOutput(true);
conn.setDoInput(true);
out = new PrintWriter(conn.getOutputStream());
// 发送请求参数
out.print(body);
// flush输出流的缓冲
out.flush();
// 定义BufferedReader输入流来读取URL的响应
InputStream is;
HttpURLConnection httpConn = (HttpURLConnection) conn;
if (httpConn.getResponseCode() == 200) {
is = httpConn.getInputStream();
} else {
is = httpConn.getErrorStream();
}
in = new BufferedReader(new InputStreamReader(is));
String line;
while ((line = in.readLine()) != null) {
result.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (out != null) {
out.close();
}
if (in != null) {
in.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
return result.toString();
}
public static void main(String[] args) throws JsonProcessingException {
LangboatOpenClient client = new LangboatOpenClient("<Your Access Key>", "<Your Access Secret>");
ObjectMapper mapper = new ObjectMapper();
// 句子向量
Map<String, String> queries = Map.of(
"action", "embedSentences",
"sentences", mapper.writeValueAsString(
Map.of("data", new String[]{"道可道非常道,名可名非常名", "博学之,审问之,慎思之,明辨之,笃行之。"}))
);
Map<String, Object> data = Map.of();
Object o = client.inference(queries);
System.out.println(o);
}
}

Python (>=3.6)

import base64
import datetime
import hashlib
import hmac
import json
import random
import requests
class LangboatOpenClient:
"""澜舟开放平台客户端"""
def __init__(self,
access_key: str,
access_secret: str,
url: str = "https://open.langboat.com"):
self.access_key = access_key
self.access_secret = access_secret
self.url = url
def _build_header(self, query: dict, data: dict) -> dict:
accept = "application/json"
# 1. body MD5 加密
content_md5 = base64.b64encode(
hashlib.md5(
json.dumps(data).encode("utf-8")
).digest()
).decode()
content_type = "application/json"
gmt_format = '%a, %d %b %Y %H:%M:%S GMT'
date = datetime.datetime.utcnow().strftime(gmt_format)
signature_method = "HMAC-SHA256"
signature_nonce = str(random.randint(0, 65535))
header_string = f"POST\n{accept}\n{content_md5}\n{content_type}\n" \
f"{date}\n{signature_method}\n{signature_nonce}\n"
# 2. 计算 queryToSign
queries_str = []
for k, v in sorted(query.items(), key=lambda item: item[0]):
if isinstance(v, list):
for i in v:
queries_str.append(f"{k}={i}")
else:
queries_str.append(f"{k}={v}")
queries_string = '&'.join(queries_str)
# 3.计算 stringToSign
sign_string = header_string + queries_string
# 4.计算 HMAC-SHA256 + Base64
secret_bytes = self.access_secret.encode("utf-8")
# 5.计算签名
signature = base64.b64encode(
hmac.new(secret_bytes, sign_string.encode("utf-8"), hashlib.sha256).digest()
).decode()
res = {
"Content-Type": content_type,
"Content-MD5": content_md5,
"Date": date,
"Accept": accept,
"X-Langboat-Signature-Method": signature_method,
"X-Langboat-Signature-Nonce": signature_nonce,
"Authorization": f"{self.access_key}:{signature}"
}
return res
def inference(self, queries: dict, data: dict) -> (int, dict):
"""
调用
:param queries: query 参数
:param data: request body 数据
:return: response status, response body to json
"""
headers = self._build_header(queries, data)
response = requests.post(url=self.url, headers=headers, params=queries, json=data)
return response.status_code, response.json()
if __name__ == '__main__':
_access_key = '<Your access_key>'
_access_secret = '<Your access_secret>'
client = LangboatOpenClient(
access_key=_access_key,
access_secret=_access_secret
)
_data = {}
# 句子向量
_queries = {
"action": "embedSentences",
"sentences": json.dumps(
{"data": ["道可道非常道,名可名非常名", "博学之,审问之,慎思之,明辨之,笃行之。"]},
ensure_ascii=False
)
}
status_code, result = client.inference(_queries, _data)
print("response status:", status_code)
print("response json:", json.dumps(result, ensure_ascii=False, indent=2))

Go (>=1.14)

package main
import (
"crypto/hmac"
"crypto/md5"
"crypto/sha256"
"encoding/base64"
"encoding/json"
"fmt"
"io/ioutil"
"log"
"math/rand"
"net/http"
"net/url"
"sort"
"strings"
"time"
)
func main() {
client := OpenClient{
baseURL: "https://open.langboat.com",
accessKey: "Your_Access_Key",
accessSecret: "Your_Access_Secret",
}
// 句子向量
_sentences, err := json.Marshal(map[string][]string{
"data": {"道可道非常道,名可名非常名", "博学之,审问之,慎思之,明辨之,笃行之。"},
})
queries := map[string]string{
"action": "embedSentences",
"sentences": string(_sentences),
}
resp := client.inference(queries)
response, ok := resp.(*http.Response)
if !ok {
log.Fatal("fail to convert response")
}
body, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatal("fail to read response body")
}
log.Println(string(body))
}
type OpenClient struct {
baseURL string
accessKey string
accessSecret string
}
func (c *OpenClient) inference(queries map[string]string) interface{} {
var queriesStr = ""
var first = true
for k, v := range queries {
if first {
queriesStr += "?" + k + "=" + url.QueryEscape(v)
first = false
} else {
queriesStr += "&" + k + "=" + url.QueryEscape(v)
}
}
targetURL := c.baseURL + queriesStr
client := &http.Client{
Timeout: 15 * time.Second,
}
// 构造header
var (
payload = strings.NewReader("")
date = time.Now().UTC().Format(http.TimeFormat)
nonce = fmt.Sprint(10000 + rand.Intn(89999))
)
// 签名
signParam := SignParam{
Body: "",
Query: queriesStr[1:],
DateGMT: date,
Nonce: nonce,
}
contentMD5, signature := GenSignature(signParam, c.accessSecret)
// 设置header
headers := map[string]string{
"Authorization": c.accessKey + ":" + signature,
"Content-Type": "application/json",
"Accept": "application/json",
"Date": date,
"Content-MD5": contentMD5,
"x-langboat-signature-method": "HMAC-SHA256",
"x-langboat-signature-nonce": nonce,
}
req, _ := http.NewRequest("POST", targetURL, payload)
for k, v := range headers {
req.Header.Add(k, v)
}
resp, err := client.Do(req)
if err != nil {
log.Println(err.Error())
}
return resp
}
// SignParam 生成签名需要的参数
type SignParam struct {
Body string // body数据
Query string // 原始query
DateGMT string // GTM时间
Nonce string // 随机数
}
func getMD5(str string) []byte {
h := md5.New()
h.Write([]byte(str))
return h.Sum(nil)
}
func hmacSha256(data string, secret string) []byte {
h := hmac.New(sha256.New, []byte(secret))
h.Write([]byte(data))
return h.Sum(nil)
}
func resortQuery(src string) string {
queries, _ := url.ParseQuery(src)
keys := make([]string, 0)
for k := range queries {
keys = append(keys, k)
}
sort.Strings(keys)
newQuery := url.Values{}
for _, k := range keys {
for _, value := range queries[k] {
newQuery.Add(k, value)
}
}
return newQuery.Encode()
}
// GenSignature 生成签名
func GenSignature(src SignParam, apiSecret string) (string, string) {
// 计算body的md5值
md5str := getMD5(src.Body)
// base64后得到contentMD5
contentMD5 := base64.StdEncoding.EncodeToString(md5str)
// query解析,并按照字典序重新排列
query := resortQuery(src.Query)
query, _ = url.QueryUnescape(query)
// 需要做签名的字符串结构
stringToSign := `POST
application/json
%s
application/json
%s
HMAC-SHA256
%s
%s`
stringToSign = fmt.Sprintf(stringToSign, contentMD5, src.DateGMT, src.Nonce, query)
hmac256 := hmacSha256(stringToSign, apiSecret)
signature := base64.StdEncoding.EncodeToString(hmac256)
return contentMD5, signature
}
请求地址
langboat-footer-logo
澜舟科技 © 2021
京公网安备 11010802035393号京ICP备2021021087号

核心技术

预训练语言模型

认知智能平台

机器翻译

开发者社区

关于我们

© 2021 澜舟科技
langboat
京公网安备 11010802035393号京ICP备2021021087号