diff --git a/docs/link-preview.md b/docs/link-preview.md new file mode 100644 index 0000000..1b2876b --- /dev/null +++ b/docs/link-preview.md @@ -0,0 +1,258 @@ +# Link preview (`/api/link-preview`) + +Telegram-style rich card for the **first URL** found in a post's text. +Front-end renders a single clickable card showing site name, title, +description, and a thumbnail; the data is fetched from a back-end proxy +that scrapes Open Graph / oEmbed / Twitter Card metadata once and caches +it. + +> **Scope**: only the first link in the post text gets a preview, matching +> Telegram's behaviour. Any additional URLs in the same post still render +> as inline autolinks but do not get their own card. + +## Why a back-end proxy + +Browsers cannot fetch arbitrary cross-origin pages, so OG metadata must be +fetched server-side. A single proxy endpoint keeps secrets / outbound IPs on +the server and lets us cache so each URL is only scraped once for the whole +audience. + +--- + +## Endpoint contract + +``` +GET /api/link-preview?url= +``` + +| Query | Required | Notes | +| ----- | -------- | ------------------------------------------------------------------------------------------------------------------------------- | +| `url` | yes | Absolute `http://` or `https://` URL. Must be `URI` encoded so query strings inside the target URL survive the round trip. | + +### Success — `200 OK` + +```json +{ + "url": "https://app.safe.global/welcome", + "canonicalUrl": "https://app.safe.global/welcome", + "siteName": "app.safe.global", + "title": "Safe{Wallet}", + "description": "Safe{Wallet} is the most trusted smart account wallet on Ethereum with over $100B secured.", + "imageUrl": "https://app.safe.global/og.png", + "imageWidth": 1200, + "imageHeight": 630, + "favicon": "https://app.safe.global/favicon.ico", + "themeColor": "#12FF80", + "fetchedAt": "2026-05-29T10:00:00Z", + "cacheTtlSeconds": 86400 +} +``` + +- All string fields except `url` may be empty. The front-end gracefully hides + rows that are missing (e.g. no `imageUrl` → image area is omitted). +- `url` echoes the original input so the client can match the response + against the URL it asked about, even if the request was racy. +- `canonicalUrl` is the URL the client should open when the card is tapped. + Defaults to `url` if no `` was found. + +### Already cached / freshly cached — same shape + +The endpoint is idempotent and the response shape is identical whether +the metadata is hot, warm, or freshly scraped. + +### Errors + +| Status | When | Body shape | +| ------ | --------------------------------------------------- | --------------------------------------------------------------------------- | +| `400` | Missing / invalid / non-http(s) `url` | `{ "error": "invalid_url" }` | +| `422` | URL passed validation but resolves to a private/internal address (SSRF guard) | `{ "error": "blocked_target" }` | +| `404` | Target returned 404 or fetch produced no metadata | `{ "error": "not_found" }` | +| `408` | Target took longer than the timeout to respond | `{ "error": "timeout" }` | +| `502` | Target returned 5xx | `{ "error": "upstream_error" }` | +| `429` | Rate limit on this client / IP | `{ "error": "rate_limited", "retryAfter": 60 }` | + +The front-end treats every non-`200` as “no preview available” and +silently renders nothing. No toasts. URLs already render as inline +clickable text via `autolink`, so the user is never blocked. + +--- + +## Caching strategy + +Store one row per `canonicalUrl` (or normalized `url` if `canonicalUrl` is +absent). Suggested TTLs: + +- Successful preview: **24 hours** (`cacheTtlSeconds: 86400`). +- 404 / timeout / blocked: **6 hours** negative cache. Otherwise transient + failures on the target site will hammer the proxy. +- Send `Cache-Control: public, max-age=86400` so CDN / browser also cache. + +Cache key normalization: +- Lowercase scheme + host. +- Strip the trailing slash on the path when it's the only character. +- Strip `utm_*`, `ref`, `referrer`, `fbclid`, `gclid` query params. +- Keep the rest of the query and fragment as-is. + +--- + +## SSRF and abuse guard (must-have) + +The proxy will fetch any URL the front-end asks about, which is dangerous. +Before issuing the outbound request: + +1. Resolve the host to all of its A/AAAA records. +2. Reject if any resolved IP is in: loopback, link-local, private + (RFC1918), `0.0.0.0/8`, multicast, broadcast, or the internal cluster + CIDR. +3. Reject schemes other than `http` and `https`. +4. Cap response body at **5 MB**; abort on overflow. +5. Cap request total time at **5 s**; abort on timeout. +6. Cap redirect chain at **3 hops**; re-validate target IP at each hop. +7. Do not forward client cookies, auth headers, or `Referer` to the target. +8. Use a clear `User-Agent` such as `ArkLibraryLinkBot/1.0 (+https://ark-library.com/bot)`. +9. Per-client (IP or session) rate limit, e.g. 60 req / min. + +--- + +## Metadata extraction precedence + +For each field, pick the first present: + +| Field | Sources (in order) | +| ------------- | -------------------------------------------------------------------------------------------------------- | +| `title` | `og:title` → `twitter:title` → `` → empty | +| `description` | `og:description` → `twitter:description` → `<meta name="description">` → empty | +| `imageUrl` | `og:image:secure_url` → `og:image` → `twitter:image` → first prominent `<img>` (skip if <200×200) → empty | +| `siteName` | `og:site_name` → `application-name` → hostname (sans `www.`) | +| `canonicalUrl`| `<link rel="canonical">` → request URL | +| `favicon` | `<link rel="icon">` → `<link rel="shortcut icon">` → `/favicon.ico` | +| `themeColor` | `<meta name="theme-color">` | + +Resolve any relative URLs (`og:image`, `favicon`, `canonical`) against the +final response URL (after redirects). + +--- + +## Provider quirks worth handling + +These quirks save a lot of "why doesn't this site preview?" debugging later. + +- **Twitter / X**: `x.com` and `twitter.com` strip OG when not signed in. Use + the public oEmbed endpoint + `https://publish.twitter.com/oembed?url=...&omit_script=1` for + Twitter/X URLs and map: `title = author_name`, `description = html` stripped + to text, `imageUrl = thumbnail_url` if available. +- **YouTube**: prefer `https://noembed.com/embed?url=...` or + `https://www.youtube.com/oembed?url=...&format=json` (no key). +- **Reddit / Mastodon**: standard OG works fine. +- **Sites behind Cloudflare bot challenge**: surface 502 to the client. + Don't retry hot — let the negative-cache TTL absorb it. +- **AMP pages**: prefer `og:url` when present so the cached entry points to + the canonical page, not the AMP variant. + +--- + +## Front-end integration + +### Type addition (`src/types/post.ts`) + +```ts +export type LinkPreview = { + url: string; + canonicalUrl: string; + siteName: string; + title: string; + description: string; + imageUrl?: string; + imageWidth?: number; + imageHeight?: number; + favicon?: string; + themeColor?: string; +}; + +export type Post = { + // ...existing fields + /** Preview for the first URL in `text`. At most one per post. */ + linkPreview?: LinkPreview; +}; +``` + +### Which URL gets previewed + +The back-end picks the **first** URL it finds in `text` using the same +regex as the front-end's `autolink` (`/(https?:\/\/[^\s<>"]+[^\s<>".,;:!?)\]}'])/i`). +Only that URL is fetched, stored, and returned as `post.linkPreview`. Any +later URLs in the same post are ignored for preview purposes (still +clickable inline via `autolink`). + +### Where data comes from + +Two viable paths — pick one when wiring the back-end. + +1. **Inline on `Post`** (preferred): the post API enriches each post with + `linkPreview`. The first URL in `text` is resolved once at write time + (or lazily on first read with a background job). The client renders + without making any extra request. +2. **Client-side lookup**: the client extracts the first URL via the + existing `autolink` regex, calls `/api/link-preview?url=...` once per + post (with in-memory dedupe across posts that share the same URL), and + renders the card when the response comes back. Slower first paint but + keeps the posts endpoint cheap. + +Recommend (1) for the public feed and keep `/api/link-preview` available for +(2) only on admin previews. + +### Rendering + +- New component: `src/components/messageStream/LinkPreviewCard.tsx` +- Renders a card with a left vertical 3px accent bar (`themeColor` → + fallback `bg-ark-gold`). +- Layout: + + ``` + ┌──────────────────────────────────────────────────┐ + │ ▍ siteName (12px, neutral-400) │ + │ ▍ Title (15px, bold, neutral-100) │ + │ ▍ Description (13px, neutral-300, 3-line clamp) │ + │ ▍ ┌────────────────────────────────────────────┐ │ + │ ▍ │ imageUrl (lazy, aspect-video, rounded) │ │ + │ ▍ └────────────────────────────────────────────┘ │ + └──────────────────────────────────────────────────┘ + ``` + +- Whole card is `<a href={canonicalUrl} target="_blank" rel="noopener noreferrer">`. +- Reuse the bubble background (`bg-[#272632]` is OK, slightly lift with + `bg-white/[0.03]` overlay so the card reads as inset within the bubble). +- Mount points (text-bearing bubbles only): `TextBubble`, + `ImageWithTextBubble`, `AlbumBubble`, `VideoBubble`, `FileDocBubble`. + Render below the existing `CollapsibleText` so cards stay visible even + when long text is collapsed. + +### Picking the URL to preview + +If `post.linkPreview` is present, render that single card. Otherwise the +bubble renders nothing extra (URLs still autolink inline). The front-end +never picks the URL itself — that decision lives on the back-end so the +client and server agree on which URL was chosen. + +### Falling back gracefully + +- No `imageUrl` → omit the image area, keep the text block. +- Title shorter than 8 characters → hide the description below (treat as + a low-confidence preview). +- Title empty and description empty → render nothing. + +--- + +## Open questions for the back-end + +- Where in the stack will OG extraction live? Existing post pipeline, a + worker queue, or inline on read? +- Storage: a new `link_previews` table keyed by `canonicalUrl`, with a + `post_link_previews` join table preserving original URL order, or just a + JSON column on `posts`? +- How aggressive should re-scrape be? E.g. re-scrape every 30 days for + successful previews, every 24 hours for `themeColor` updates. +- Should admin be able to override / hide a preview per post? Telegram has + a "no preview" toggle and editors often want it. +- Do we want a manual "refresh preview" button in the admin UI? diff --git a/src/components/messageStream/LinkPreviewCard.tsx b/src/components/messageStream/LinkPreviewCard.tsx new file mode 100644 index 0000000..3858464 --- /dev/null +++ b/src/components/messageStream/LinkPreviewCard.tsx @@ -0,0 +1,65 @@ +import type { LinkPreview } from "../../types/post"; + +/** + * Telegram-style rich preview card for a single URL embedded in a post. + * + * Renders an accent bar on the left, then site name → title → description, + * with an optional thumbnail at the bottom. The whole card is one anchor + * that opens `canonicalUrl` in a new tab. + */ +export function LinkPreviewCard({ preview }: { preview: LinkPreview }) { + const accent = preview.themeColor || "#EEB726"; + const hasUsefulText = + preview.title.length > 0 || preview.description.length > 0; + if (!hasUsefulText && !preview.imageUrl) return null; + + return ( + <a + href={preview.canonicalUrl || preview.url} + target="_blank" + rel="noopener noreferrer" + className="group block overflow-hidden rounded-lg bg-white/[0.04] transition hover:bg-white/[0.07]" + > + <div className="flex"> + <div + aria-hidden + className="w-[3px] shrink-0 rounded-l-lg" + style={{ backgroundColor: accent }} + /> + <div className="min-w-0 flex-1 px-3 py-2.5"> + {preview.siteName ? ( + <div + className="truncate text-[12px] leading-4" + style={{ color: accent }} + > + {preview.siteName} + </div> + ) : null} + {preview.title ? ( + <div className="mt-0.5 line-clamp-2 break-words text-[14px] font-semibold leading-5 text-neutral-100"> + {preview.title} + </div> + ) : null} + {preview.description ? ( + <div className="mt-1 line-clamp-3 break-words text-[13px] leading-[18px] text-neutral-300"> + {preview.description} + </div> + ) : null} + {preview.imageUrl ? ( + <div className="mt-2 overflow-hidden rounded-md bg-black/30"> + <img + src={preview.imageUrl} + alt="" + loading="lazy" + decoding="async" + width={preview.imageWidth} + height={preview.imageHeight} + className="block aspect-[1.91/1] w-full object-cover transition duration-300 group-hover:scale-[1.02]" + /> + </div> + ) : null} + </div> + </div> + </a> + ); +} diff --git a/src/components/messageStream/MessageBubble.tsx b/src/components/messageStream/MessageBubble.tsx index 29db8b8..d786abc 100644 --- a/src/components/messageStream/MessageBubble.tsx +++ b/src/components/messageStream/MessageBubble.tsx @@ -6,6 +6,7 @@ import { ImageBubble } from "./bubbles/ImageBubble"; import { ImageWithTextBubble } from "./bubbles/ImageWithTextBubble"; import { AlbumBubble } from "./bubbles/AlbumBubble"; import { VideoBubble } from "./bubbles/VideoBubble"; +import { LinkPreviewCard } from "./LinkPreviewCard"; import { formatDateTime } from "./utils/formatTime"; type BubbleComponent = ComponentType<{ post: Post }>; @@ -41,6 +42,11 @@ export function MessageBubble({ post }: { post: Post }) { }`} > <Bubble post={post} /> + {post.linkPreview ? ( + <div className={isVisual ? "px-4 pt-3" : "mt-3"}> + <LinkPreviewCard preview={post.linkPreview} /> + </div> + ) : null} <time dateTime={post.publishedAt} className={`block text-right text-[12px] leading-[19px] text-[#A8A9AE] ${ diff --git a/src/mocks/mockPosts.ts b/src/mocks/mockPosts.ts index 8e86149..6c8cc97 100644 --- a/src/mocks/mockPosts.ts +++ b/src/mocks/mockPosts.ts @@ -156,6 +156,19 @@ export const MOCK_POSTS: Post[] = [ isRecommended: false, publishedAt: "2026-01-19T16:20:00.000Z", updatedAt: "2026-01-19T16:20:00.000Z", + // Mock: only the FIRST URL in the text is previewed. + linkPreview: { + url: "https://coinmarketcap.com/currencies/ark-defai/", + canonicalUrl: "https://coinmarketcap.com/currencies/ark-defai/", + siteName: "coinmarketcap.com", + title: "ARK DeFAI Price, Chart & Market Cap", + description: + "Track ARK DeFAI live price, market cap, volume and historical chart on CoinMarketCap. Verified contract address, holders and on-chain analytics.", + imageUrl: img(81, 1200, 630), + imageWidth: 1200, + imageHeight: 630, + themeColor: "#2962FF", + }, }, // 6) 纯文本 + 单链接(简短公告) @@ -254,6 +267,15 @@ export const MOCK_POSTS: Post[] = [ categorySlug: "meeting", language: "zh-CN", text: "📌 ARK DeFAI 方舟晨间时刻\n\n🧠 会议主题:市场概况交流 & 市场问题讨论。\n🕙 会议时间:3月1日(日)10:00\n🎬 直播腾讯会议链接:https://meeting.tencent.com/l/G718S4Sedm38", + linkPreview: { + url: "https://meeting.tencent.com/l/G718S4Sedm38", + canonicalUrl: "https://meeting.tencent.com/l/G718S4Sedm38", + siteName: "meeting.tencent.com", + title: "腾讯会议 · ARK DeFAI 方舟晨间时刻", + description: + "点击直接加入直播会议。需要 App 或浏览器插件。会议号会在点击后自动补全。", + themeColor: "#0080FF", + }, attachments: [ { id: "a-010", diff --git a/src/types/post.ts b/src/types/post.ts index 8ac3418..14d94bc 100644 --- a/src/types/post.ts +++ b/src/types/post.ts @@ -34,6 +34,24 @@ export type Attachment = { thumbnailUrl?: string; }; +/** + * Preview metadata for the first URL found in a post's text. See + * `docs/link-preview.md` for the back-end contract. + */ +export type LinkPreview = { + url: string; + canonicalUrl: string; + siteName: string; + title: string; + description: string; + imageUrl?: string; + imageWidth?: number; + imageHeight?: number; + favicon?: string; + /** Hex color used for the left accent bar (e.g. "#12FF80"). */ + themeColor?: string; +}; + export type Post = { id: string; postType?: PostType | string; @@ -49,6 +67,8 @@ export type Post = { updatedAt?: string; createdAt?: string; tags?: string[]; + /** Preview card for the first URL in `text`. At most one per post. */ + linkPreview?: LinkPreview; }; export type PostListResponse = {