terry-staging #11
258
docs/link-preview.md
Normal file
258
docs/link-preview.md
Normal file
@@ -0,0 +1,258 @@
|
||||
# Link preview (`/api/link-preview`)
|
||||
|
||||
Telegram-style rich card for the **first URL** found in a post's text.
|
||||
Front-end renders a single clickable card showing site name, title,
|
||||
description, and a thumbnail; the data is fetched from a back-end proxy
|
||||
that scrapes Open Graph / oEmbed / Twitter Card metadata once and caches
|
||||
it.
|
||||
|
||||
> **Scope**: only the first link in the post text gets a preview, matching
|
||||
> Telegram's behaviour. Any additional URLs in the same post still render
|
||||
> as inline autolinks but do not get their own card.
|
||||
|
||||
## Why a back-end proxy
|
||||
|
||||
Browsers cannot fetch arbitrary cross-origin pages, so OG metadata must be
|
||||
fetched server-side. A single proxy endpoint keeps secrets / outbound IPs on
|
||||
the server and lets us cache so each URL is only scraped once for the whole
|
||||
audience.
|
||||
|
||||
---
|
||||
|
||||
## Endpoint contract
|
||||
|
||||
```
|
||||
GET /api/link-preview?url=<encoded-absolute-url>
|
||||
```
|
||||
|
||||
| Query | Required | Notes |
|
||||
| ----- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `url` | yes | Absolute `http://` or `https://` URL. Must be `URI` encoded so query strings inside the target URL survive the round trip. |
|
||||
|
||||
### Success — `200 OK`
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://app.safe.global/welcome",
|
||||
"canonicalUrl": "https://app.safe.global/welcome",
|
||||
"siteName": "app.safe.global",
|
||||
"title": "Safe{Wallet}",
|
||||
"description": "Safe{Wallet} is the most trusted smart account wallet on Ethereum with over $100B secured.",
|
||||
"imageUrl": "https://app.safe.global/og.png",
|
||||
"imageWidth": 1200,
|
||||
"imageHeight": 630,
|
||||
"favicon": "https://app.safe.global/favicon.ico",
|
||||
"themeColor": "#12FF80",
|
||||
"fetchedAt": "2026-05-29T10:00:00Z",
|
||||
"cacheTtlSeconds": 86400
|
||||
}
|
||||
```
|
||||
|
||||
- All string fields except `url` may be empty. The front-end gracefully hides
|
||||
rows that are missing (e.g. no `imageUrl` → image area is omitted).
|
||||
- `url` echoes the original input so the client can match the response
|
||||
against the URL it asked about, even if the request was racy.
|
||||
- `canonicalUrl` is the URL the client should open when the card is tapped.
|
||||
Defaults to `url` if no `<link rel=canonical>` was found.
|
||||
|
||||
### Already cached / freshly cached — same shape
|
||||
|
||||
The endpoint is idempotent and the response shape is identical whether
|
||||
the metadata is hot, warm, or freshly scraped.
|
||||
|
||||
### Errors
|
||||
|
||||
| Status | When | Body shape |
|
||||
| ------ | --------------------------------------------------- | --------------------------------------------------------------------------- |
|
||||
| `400` | Missing / invalid / non-http(s) `url` | `{ "error": "invalid_url" }` |
|
||||
| `422` | URL passed validation but resolves to a private/internal address (SSRF guard) | `{ "error": "blocked_target" }` |
|
||||
| `404` | Target returned 404 or fetch produced no metadata | `{ "error": "not_found" }` |
|
||||
| `408` | Target took longer than the timeout to respond | `{ "error": "timeout" }` |
|
||||
| `502` | Target returned 5xx | `{ "error": "upstream_error" }` |
|
||||
| `429` | Rate limit on this client / IP | `{ "error": "rate_limited", "retryAfter": 60 }` |
|
||||
|
||||
The front-end treats every non-`200` as “no preview available” and
|
||||
silently renders nothing. No toasts. URLs already render as inline
|
||||
clickable text via `autolink`, so the user is never blocked.
|
||||
|
||||
---
|
||||
|
||||
## Caching strategy
|
||||
|
||||
Store one row per `canonicalUrl` (or normalized `url` if `canonicalUrl` is
|
||||
absent). Suggested TTLs:
|
||||
|
||||
- Successful preview: **24 hours** (`cacheTtlSeconds: 86400`).
|
||||
- 404 / timeout / blocked: **6 hours** negative cache. Otherwise transient
|
||||
failures on the target site will hammer the proxy.
|
||||
- Send `Cache-Control: public, max-age=86400` so CDN / browser also cache.
|
||||
|
||||
Cache key normalization:
|
||||
- Lowercase scheme + host.
|
||||
- Strip the trailing slash on the path when it's the only character.
|
||||
- Strip `utm_*`, `ref`, `referrer`, `fbclid`, `gclid` query params.
|
||||
- Keep the rest of the query and fragment as-is.
|
||||
|
||||
---
|
||||
|
||||
## SSRF and abuse guard (must-have)
|
||||
|
||||
The proxy will fetch any URL the front-end asks about, which is dangerous.
|
||||
Before issuing the outbound request:
|
||||
|
||||
1. Resolve the host to all of its A/AAAA records.
|
||||
2. Reject if any resolved IP is in: loopback, link-local, private
|
||||
(RFC1918), `0.0.0.0/8`, multicast, broadcast, or the internal cluster
|
||||
CIDR.
|
||||
3. Reject schemes other than `http` and `https`.
|
||||
4. Cap response body at **5 MB**; abort on overflow.
|
||||
5. Cap request total time at **5 s**; abort on timeout.
|
||||
6. Cap redirect chain at **3 hops**; re-validate target IP at each hop.
|
||||
7. Do not forward client cookies, auth headers, or `Referer` to the target.
|
||||
8. Use a clear `User-Agent` such as `ArkLibraryLinkBot/1.0 (+https://ark-library.com/bot)`.
|
||||
9. Per-client (IP or session) rate limit, e.g. 60 req / min.
|
||||
|
||||
---
|
||||
|
||||
## Metadata extraction precedence
|
||||
|
||||
For each field, pick the first present:
|
||||
|
||||
| Field | Sources (in order) |
|
||||
| ------------- | -------------------------------------------------------------------------------------------------------- |
|
||||
| `title` | `og:title` → `twitter:title` → `<title>` → empty |
|
||||
| `description` | `og:description` → `twitter:description` → `<meta name="description">` → empty |
|
||||
| `imageUrl` | `og:image:secure_url` → `og:image` → `twitter:image` → first prominent `<img>` (skip if <200×200) → empty |
|
||||
| `siteName` | `og:site_name` → `application-name` → hostname (sans `www.`) |
|
||||
| `canonicalUrl`| `<link rel="canonical">` → request URL |
|
||||
| `favicon` | `<link rel="icon">` → `<link rel="shortcut icon">` → `/favicon.ico` |
|
||||
| `themeColor` | `<meta name="theme-color">` |
|
||||
|
||||
Resolve any relative URLs (`og:image`, `favicon`, `canonical`) against the
|
||||
final response URL (after redirects).
|
||||
|
||||
---
|
||||
|
||||
## Provider quirks worth handling
|
||||
|
||||
These quirks save a lot of "why doesn't this site preview?" debugging later.
|
||||
|
||||
- **Twitter / X**: `x.com` and `twitter.com` strip OG when not signed in. Use
|
||||
the public oEmbed endpoint
|
||||
`https://publish.twitter.com/oembed?url=...&omit_script=1` for
|
||||
Twitter/X URLs and map: `title = author_name`, `description = html` stripped
|
||||
to text, `imageUrl = thumbnail_url` if available.
|
||||
- **YouTube**: prefer `https://noembed.com/embed?url=...` or
|
||||
`https://www.youtube.com/oembed?url=...&format=json` (no key).
|
||||
- **Reddit / Mastodon**: standard OG works fine.
|
||||
- **Sites behind Cloudflare bot challenge**: surface 502 to the client.
|
||||
Don't retry hot — let the negative-cache TTL absorb it.
|
||||
- **AMP pages**: prefer `og:url` when present so the cached entry points to
|
||||
the canonical page, not the AMP variant.
|
||||
|
||||
---
|
||||
|
||||
## Front-end integration
|
||||
|
||||
### Type addition (`src/types/post.ts`)
|
||||
|
||||
```ts
|
||||
export type LinkPreview = {
|
||||
url: string;
|
||||
canonicalUrl: string;
|
||||
siteName: string;
|
||||
title: string;
|
||||
description: string;
|
||||
imageUrl?: string;
|
||||
imageWidth?: number;
|
||||
imageHeight?: number;
|
||||
favicon?: string;
|
||||
themeColor?: string;
|
||||
};
|
||||
|
||||
export type Post = {
|
||||
// ...existing fields
|
||||
/** Preview for the first URL in `text`. At most one per post. */
|
||||
linkPreview?: LinkPreview;
|
||||
};
|
||||
```
|
||||
|
||||
### Which URL gets previewed
|
||||
|
||||
The back-end picks the **first** URL it finds in `text` using the same
|
||||
regex as the front-end's `autolink` (`/(https?:\/\/[^\s<>"]+[^\s<>".,;:!?)\]}'])/i`).
|
||||
Only that URL is fetched, stored, and returned as `post.linkPreview`. Any
|
||||
later URLs in the same post are ignored for preview purposes (still
|
||||
clickable inline via `autolink`).
|
||||
|
||||
### Where data comes from
|
||||
|
||||
Two viable paths — pick one when wiring the back-end.
|
||||
|
||||
1. **Inline on `Post`** (preferred): the post API enriches each post with
|
||||
`linkPreview`. The first URL in `text` is resolved once at write time
|
||||
(or lazily on first read with a background job). The client renders
|
||||
without making any extra request.
|
||||
2. **Client-side lookup**: the client extracts the first URL via the
|
||||
existing `autolink` regex, calls `/api/link-preview?url=...` once per
|
||||
post (with in-memory dedupe across posts that share the same URL), and
|
||||
renders the card when the response comes back. Slower first paint but
|
||||
keeps the posts endpoint cheap.
|
||||
|
||||
Recommend (1) for the public feed and keep `/api/link-preview` available for
|
||||
(2) only on admin previews.
|
||||
|
||||
### Rendering
|
||||
|
||||
- New component: `src/components/messageStream/LinkPreviewCard.tsx`
|
||||
- Renders a card with a left vertical 3px accent bar (`themeColor` →
|
||||
fallback `bg-ark-gold`).
|
||||
- Layout:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ ▍ siteName (12px, neutral-400) │
|
||||
│ ▍ Title (15px, bold, neutral-100) │
|
||||
│ ▍ Description (13px, neutral-300, 3-line clamp) │
|
||||
│ ▍ ┌────────────────────────────────────────────┐ │
|
||||
│ ▍ │ imageUrl (lazy, aspect-video, rounded) │ │
|
||||
│ ▍ └────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- Whole card is `<a href={canonicalUrl} target="_blank" rel="noopener noreferrer">`.
|
||||
- Reuse the bubble background (`bg-[#272632]` is OK, slightly lift with
|
||||
`bg-white/[0.03]` overlay so the card reads as inset within the bubble).
|
||||
- Mount points (text-bearing bubbles only): `TextBubble`,
|
||||
`ImageWithTextBubble`, `AlbumBubble`, `VideoBubble`, `FileDocBubble`.
|
||||
Render below the existing `CollapsibleText` so cards stay visible even
|
||||
when long text is collapsed.
|
||||
|
||||
### Picking the URL to preview
|
||||
|
||||
If `post.linkPreview` is present, render that single card. Otherwise the
|
||||
bubble renders nothing extra (URLs still autolink inline). The front-end
|
||||
never picks the URL itself — that decision lives on the back-end so the
|
||||
client and server agree on which URL was chosen.
|
||||
|
||||
### Falling back gracefully
|
||||
|
||||
- No `imageUrl` → omit the image area, keep the text block.
|
||||
- Title shorter than 8 characters → hide the description below (treat as
|
||||
a low-confidence preview).
|
||||
- Title empty and description empty → render nothing.
|
||||
|
||||
---
|
||||
|
||||
## Open questions for the back-end
|
||||
|
||||
- Where in the stack will OG extraction live? Existing post pipeline, a
|
||||
worker queue, or inline on read?
|
||||
- Storage: a new `link_previews` table keyed by `canonicalUrl`, with a
|
||||
`post_link_previews` join table preserving original URL order, or just a
|
||||
JSON column on `posts`?
|
||||
- How aggressive should re-scrape be? E.g. re-scrape every 30 days for
|
||||
successful previews, every 24 hours for `themeColor` updates.
|
||||
- Should admin be able to override / hide a preview per post? Telegram has
|
||||
a "no preview" toggle and editors often want it.
|
||||
- Do we want a manual "refresh preview" button in the admin UI?
|
||||
65
src/components/messageStream/LinkPreviewCard.tsx
Normal file
65
src/components/messageStream/LinkPreviewCard.tsx
Normal file
@@ -0,0 +1,65 @@
|
||||
import type { LinkPreview } from "../../types/post";
|
||||
|
||||
/**
|
||||
* Telegram-style rich preview card for a single URL embedded in a post.
|
||||
*
|
||||
* Renders an accent bar on the left, then site name → title → description,
|
||||
* with an optional thumbnail at the bottom. The whole card is one anchor
|
||||
* that opens `canonicalUrl` in a new tab.
|
||||
*/
|
||||
export function LinkPreviewCard({ preview }: { preview: LinkPreview }) {
|
||||
const accent = preview.themeColor || "#EEB726";
|
||||
const hasUsefulText =
|
||||
preview.title.length > 0 || preview.description.length > 0;
|
||||
if (!hasUsefulText && !preview.imageUrl) return null;
|
||||
|
||||
return (
|
||||
<a
|
||||
href={preview.canonicalUrl || preview.url}
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="group block overflow-hidden rounded-lg bg-white/[0.04] transition hover:bg-white/[0.07]"
|
||||
>
|
||||
<div className="flex">
|
||||
<div
|
||||
aria-hidden
|
||||
className="w-[3px] shrink-0 rounded-l-lg"
|
||||
style={{ backgroundColor: accent }}
|
||||
/>
|
||||
<div className="min-w-0 flex-1 px-3 py-2.5">
|
||||
{preview.siteName ? (
|
||||
<div
|
||||
className="truncate text-[12px] leading-4"
|
||||
style={{ color: accent }}
|
||||
>
|
||||
{preview.siteName}
|
||||
</div>
|
||||
) : null}
|
||||
{preview.title ? (
|
||||
<div className="mt-0.5 line-clamp-2 break-words text-[14px] font-semibold leading-5 text-neutral-100">
|
||||
{preview.title}
|
||||
</div>
|
||||
) : null}
|
||||
{preview.description ? (
|
||||
<div className="mt-1 line-clamp-3 break-words text-[13px] leading-[18px] text-neutral-300">
|
||||
{preview.description}
|
||||
</div>
|
||||
) : null}
|
||||
{preview.imageUrl ? (
|
||||
<div className="mt-2 overflow-hidden rounded-md bg-black/30">
|
||||
<img
|
||||
src={preview.imageUrl}
|
||||
alt=""
|
||||
loading="lazy"
|
||||
decoding="async"
|
||||
width={preview.imageWidth}
|
||||
height={preview.imageHeight}
|
||||
className="block aspect-[1.91/1] w-full object-cover transition duration-300 group-hover:scale-[1.02]"
|
||||
/>
|
||||
</div>
|
||||
) : null}
|
||||
</div>
|
||||
</div>
|
||||
</a>
|
||||
);
|
||||
}
|
||||
@@ -6,6 +6,7 @@ import { ImageBubble } from "./bubbles/ImageBubble";
|
||||
import { ImageWithTextBubble } from "./bubbles/ImageWithTextBubble";
|
||||
import { AlbumBubble } from "./bubbles/AlbumBubble";
|
||||
import { VideoBubble } from "./bubbles/VideoBubble";
|
||||
import { LinkPreviewCard } from "./LinkPreviewCard";
|
||||
import { formatDateTime } from "./utils/formatTime";
|
||||
|
||||
type BubbleComponent = ComponentType<{ post: Post }>;
|
||||
@@ -41,6 +42,11 @@ export function MessageBubble({ post }: { post: Post }) {
|
||||
}`}
|
||||
>
|
||||
<Bubble post={post} />
|
||||
{post.linkPreview ? (
|
||||
<div className={isVisual ? "px-4 pt-3" : "mt-3"}>
|
||||
<LinkPreviewCard preview={post.linkPreview} />
|
||||
</div>
|
||||
) : null}
|
||||
<time
|
||||
dateTime={post.publishedAt}
|
||||
className={`block text-right text-[12px] leading-[19px] text-[#A8A9AE] ${
|
||||
|
||||
@@ -156,6 +156,19 @@ export const MOCK_POSTS: Post[] = [
|
||||
isRecommended: false,
|
||||
publishedAt: "2026-01-19T16:20:00.000Z",
|
||||
updatedAt: "2026-01-19T16:20:00.000Z",
|
||||
// Mock: only the FIRST URL in the text is previewed.
|
||||
linkPreview: {
|
||||
url: "https://coinmarketcap.com/currencies/ark-defai/",
|
||||
canonicalUrl: "https://coinmarketcap.com/currencies/ark-defai/",
|
||||
siteName: "coinmarketcap.com",
|
||||
title: "ARK DeFAI Price, Chart & Market Cap",
|
||||
description:
|
||||
"Track ARK DeFAI live price, market cap, volume and historical chart on CoinMarketCap. Verified contract address, holders and on-chain analytics.",
|
||||
imageUrl: img(81, 1200, 630),
|
||||
imageWidth: 1200,
|
||||
imageHeight: 630,
|
||||
themeColor: "#2962FF",
|
||||
},
|
||||
},
|
||||
|
||||
// 6) 纯文本 + 单链接(简短公告)
|
||||
@@ -254,6 +267,15 @@ export const MOCK_POSTS: Post[] = [
|
||||
categorySlug: "meeting",
|
||||
language: "zh-CN",
|
||||
text: "📌 ARK DeFAI 方舟晨间时刻\n\n🧠 会议主题:市场概况交流 & 市场问题讨论。\n🕙 会议时间:3月1日(日)10:00\n🎬 直播腾讯会议链接:https://meeting.tencent.com/l/G718S4Sedm38",
|
||||
linkPreview: {
|
||||
url: "https://meeting.tencent.com/l/G718S4Sedm38",
|
||||
canonicalUrl: "https://meeting.tencent.com/l/G718S4Sedm38",
|
||||
siteName: "meeting.tencent.com",
|
||||
title: "腾讯会议 · ARK DeFAI 方舟晨间时刻",
|
||||
description:
|
||||
"点击直接加入直播会议。需要 App 或浏览器插件。会议号会在点击后自动补全。",
|
||||
themeColor: "#0080FF",
|
||||
},
|
||||
attachments: [
|
||||
{
|
||||
id: "a-010",
|
||||
|
||||
@@ -34,6 +34,24 @@ export type Attachment = {
|
||||
thumbnailUrl?: string;
|
||||
};
|
||||
|
||||
/**
|
||||
* Preview metadata for the first URL found in a post's text. See
|
||||
* `docs/link-preview.md` for the back-end contract.
|
||||
*/
|
||||
export type LinkPreview = {
|
||||
url: string;
|
||||
canonicalUrl: string;
|
||||
siteName: string;
|
||||
title: string;
|
||||
description: string;
|
||||
imageUrl?: string;
|
||||
imageWidth?: number;
|
||||
imageHeight?: number;
|
||||
favicon?: string;
|
||||
/** Hex color used for the left accent bar (e.g. "#12FF80"). */
|
||||
themeColor?: string;
|
||||
};
|
||||
|
||||
export type Post = {
|
||||
id: string;
|
||||
postType?: PostType | string;
|
||||
@@ -49,6 +67,8 @@ export type Post = {
|
||||
updatedAt?: string;
|
||||
createdAt?: string;
|
||||
tags?: string[];
|
||||
/** Preview card for the first URL in `text`. At most one per post. */
|
||||
linkPreview?: LinkPreview;
|
||||
};
|
||||
|
||||
export type PostListResponse = {
|
||||
|
||||
Reference in New Issue
Block a user