feat(link-preview): frontend interface for Telegram-style URL preview
Adds the front-end side of the link-preview feature so the back-end team has a fixed contract to implement against. - docs/link-preview.md: full spec for the `/api/link-preview` proxy and the preferred inline-on-Post integration. Covers caching, SSRF guards, metadata-extraction precedence, provider quirks, and the front-end rendering rules. Scope is the first URL only. - types/post.ts: new `LinkPreview` type and optional `linkPreview` field on `Post`. - LinkPreviewCard: clickable card with a themeColor accent bar, siteName / title / description (line-clamped), and an optional 1.91:1 thumbnail. Whole card is an `<a target="_blank">` to the canonical URL. - MessageBubble: render the card between the bubble body and the timestamp, with padding that matches visual vs. text-only bubbles. - mockPosts: example `linkPreview` payloads on p-005 and p-010 so the visual works when running with VITE_USE_MOCK_POSTS=true, and so the back-end has concrete reference values.
This commit is contained in:
258
docs/link-preview.md
Normal file
258
docs/link-preview.md
Normal file
@@ -0,0 +1,258 @@
|
||||
# Link preview (`/api/link-preview`)
|
||||
|
||||
Telegram-style rich card for the **first URL** found in a post's text.
|
||||
Front-end renders a single clickable card showing site name, title,
|
||||
description, and a thumbnail; the data is fetched from a back-end proxy
|
||||
that scrapes Open Graph / oEmbed / Twitter Card metadata once and caches
|
||||
it.
|
||||
|
||||
> **Scope**: only the first link in the post text gets a preview, matching
|
||||
> Telegram's behaviour. Any additional URLs in the same post still render
|
||||
> as inline autolinks but do not get their own card.
|
||||
|
||||
## Why a back-end proxy
|
||||
|
||||
Browsers cannot fetch arbitrary cross-origin pages, so OG metadata must be
|
||||
fetched server-side. A single proxy endpoint keeps secrets / outbound IPs on
|
||||
the server and lets us cache so each URL is only scraped once for the whole
|
||||
audience.
|
||||
|
||||
---
|
||||
|
||||
## Endpoint contract
|
||||
|
||||
```
|
||||
GET /api/link-preview?url=<encoded-absolute-url>
|
||||
```
|
||||
|
||||
| Query | Required | Notes |
|
||||
| ----- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `url` | yes | Absolute `http://` or `https://` URL. Must be `URI` encoded so query strings inside the target URL survive the round trip. |
|
||||
|
||||
### Success — `200 OK`
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://app.safe.global/welcome",
|
||||
"canonicalUrl": "https://app.safe.global/welcome",
|
||||
"siteName": "app.safe.global",
|
||||
"title": "Safe{Wallet}",
|
||||
"description": "Safe{Wallet} is the most trusted smart account wallet on Ethereum with over $100B secured.",
|
||||
"imageUrl": "https://app.safe.global/og.png",
|
||||
"imageWidth": 1200,
|
||||
"imageHeight": 630,
|
||||
"favicon": "https://app.safe.global/favicon.ico",
|
||||
"themeColor": "#12FF80",
|
||||
"fetchedAt": "2026-05-29T10:00:00Z",
|
||||
"cacheTtlSeconds": 86400
|
||||
}
|
||||
```
|
||||
|
||||
- All string fields except `url` may be empty. The front-end gracefully hides
|
||||
rows that are missing (e.g. no `imageUrl` → image area is omitted).
|
||||
- `url` echoes the original input so the client can match the response
|
||||
against the URL it asked about, even if the request was racy.
|
||||
- `canonicalUrl` is the URL the client should open when the card is tapped.
|
||||
Defaults to `url` if no `<link rel=canonical>` was found.
|
||||
|
||||
### Already cached / freshly cached — same shape
|
||||
|
||||
The endpoint is idempotent and the response shape is identical whether
|
||||
the metadata is hot, warm, or freshly scraped.
|
||||
|
||||
### Errors
|
||||
|
||||
| Status | When | Body shape |
|
||||
| ------ | --------------------------------------------------- | --------------------------------------------------------------------------- |
|
||||
| `400` | Missing / invalid / non-http(s) `url` | `{ "error": "invalid_url" }` |
|
||||
| `422` | URL passed validation but resolves to a private/internal address (SSRF guard) | `{ "error": "blocked_target" }` |
|
||||
| `404` | Target returned 404 or fetch produced no metadata | `{ "error": "not_found" }` |
|
||||
| `408` | Target took longer than the timeout to respond | `{ "error": "timeout" }` |
|
||||
| `502` | Target returned 5xx | `{ "error": "upstream_error" }` |
|
||||
| `429` | Rate limit on this client / IP | `{ "error": "rate_limited", "retryAfter": 60 }` |
|
||||
|
||||
The front-end treats every non-`200` as “no preview available” and
|
||||
silently renders nothing. No toasts. URLs already render as inline
|
||||
clickable text via `autolink`, so the user is never blocked.
|
||||
|
||||
---
|
||||
|
||||
## Caching strategy
|
||||
|
||||
Store one row per `canonicalUrl` (or normalized `url` if `canonicalUrl` is
|
||||
absent). Suggested TTLs:
|
||||
|
||||
- Successful preview: **24 hours** (`cacheTtlSeconds: 86400`).
|
||||
- 404 / timeout / blocked: **6 hours** negative cache. Otherwise transient
|
||||
failures on the target site will hammer the proxy.
|
||||
- Send `Cache-Control: public, max-age=86400` so CDN / browser also cache.
|
||||
|
||||
Cache key normalization:
|
||||
- Lowercase scheme + host.
|
||||
- Strip the trailing slash on the path when it's the only character.
|
||||
- Strip `utm_*`, `ref`, `referrer`, `fbclid`, `gclid` query params.
|
||||
- Keep the rest of the query and fragment as-is.
|
||||
|
||||
---
|
||||
|
||||
## SSRF and abuse guard (must-have)
|
||||
|
||||
The proxy will fetch any URL the front-end asks about, which is dangerous.
|
||||
Before issuing the outbound request:
|
||||
|
||||
1. Resolve the host to all of its A/AAAA records.
|
||||
2. Reject if any resolved IP is in: loopback, link-local, private
|
||||
(RFC1918), `0.0.0.0/8`, multicast, broadcast, or the internal cluster
|
||||
CIDR.
|
||||
3. Reject schemes other than `http` and `https`.
|
||||
4. Cap response body at **5 MB**; abort on overflow.
|
||||
5. Cap request total time at **5 s**; abort on timeout.
|
||||
6. Cap redirect chain at **3 hops**; re-validate target IP at each hop.
|
||||
7. Do not forward client cookies, auth headers, or `Referer` to the target.
|
||||
8. Use a clear `User-Agent` such as `ArkLibraryLinkBot/1.0 (+https://ark-library.com/bot)`.
|
||||
9. Per-client (IP or session) rate limit, e.g. 60 req / min.
|
||||
|
||||
---
|
||||
|
||||
## Metadata extraction precedence
|
||||
|
||||
For each field, pick the first present:
|
||||
|
||||
| Field | Sources (in order) |
|
||||
| ------------- | -------------------------------------------------------------------------------------------------------- |
|
||||
| `title` | `og:title` → `twitter:title` → `<title>` → empty |
|
||||
| `description` | `og:description` → `twitter:description` → `<meta name="description">` → empty |
|
||||
| `imageUrl` | `og:image:secure_url` → `og:image` → `twitter:image` → first prominent `<img>` (skip if <200×200) → empty |
|
||||
| `siteName` | `og:site_name` → `application-name` → hostname (sans `www.`) |
|
||||
| `canonicalUrl`| `<link rel="canonical">` → request URL |
|
||||
| `favicon` | `<link rel="icon">` → `<link rel="shortcut icon">` → `/favicon.ico` |
|
||||
| `themeColor` | `<meta name="theme-color">` |
|
||||
|
||||
Resolve any relative URLs (`og:image`, `favicon`, `canonical`) against the
|
||||
final response URL (after redirects).
|
||||
|
||||
---
|
||||
|
||||
## Provider quirks worth handling
|
||||
|
||||
These quirks save a lot of "why doesn't this site preview?" debugging later.
|
||||
|
||||
- **Twitter / X**: `x.com` and `twitter.com` strip OG when not signed in. Use
|
||||
the public oEmbed endpoint
|
||||
`https://publish.twitter.com/oembed?url=...&omit_script=1` for
|
||||
Twitter/X URLs and map: `title = author_name`, `description = html` stripped
|
||||
to text, `imageUrl = thumbnail_url` if available.
|
||||
- **YouTube**: prefer `https://noembed.com/embed?url=...` or
|
||||
`https://www.youtube.com/oembed?url=...&format=json` (no key).
|
||||
- **Reddit / Mastodon**: standard OG works fine.
|
||||
- **Sites behind Cloudflare bot challenge**: surface 502 to the client.
|
||||
Don't retry hot — let the negative-cache TTL absorb it.
|
||||
- **AMP pages**: prefer `og:url` when present so the cached entry points to
|
||||
the canonical page, not the AMP variant.
|
||||
|
||||
---
|
||||
|
||||
## Front-end integration
|
||||
|
||||
### Type addition (`src/types/post.ts`)
|
||||
|
||||
```ts
|
||||
export type LinkPreview = {
|
||||
url: string;
|
||||
canonicalUrl: string;
|
||||
siteName: string;
|
||||
title: string;
|
||||
description: string;
|
||||
imageUrl?: string;
|
||||
imageWidth?: number;
|
||||
imageHeight?: number;
|
||||
favicon?: string;
|
||||
themeColor?: string;
|
||||
};
|
||||
|
||||
export type Post = {
|
||||
// ...existing fields
|
||||
/** Preview for the first URL in `text`. At most one per post. */
|
||||
linkPreview?: LinkPreview;
|
||||
};
|
||||
```
|
||||
|
||||
### Which URL gets previewed
|
||||
|
||||
The back-end picks the **first** URL it finds in `text` using the same
|
||||
regex as the front-end's `autolink` (`/(https?:\/\/[^\s<>"]+[^\s<>".,;:!?)\]}'])/i`).
|
||||
Only that URL is fetched, stored, and returned as `post.linkPreview`. Any
|
||||
later URLs in the same post are ignored for preview purposes (still
|
||||
clickable inline via `autolink`).
|
||||
|
||||
### Where data comes from
|
||||
|
||||
Two viable paths — pick one when wiring the back-end.
|
||||
|
||||
1. **Inline on `Post`** (preferred): the post API enriches each post with
|
||||
`linkPreview`. The first URL in `text` is resolved once at write time
|
||||
(or lazily on first read with a background job). The client renders
|
||||
without making any extra request.
|
||||
2. **Client-side lookup**: the client extracts the first URL via the
|
||||
existing `autolink` regex, calls `/api/link-preview?url=...` once per
|
||||
post (with in-memory dedupe across posts that share the same URL), and
|
||||
renders the card when the response comes back. Slower first paint but
|
||||
keeps the posts endpoint cheap.
|
||||
|
||||
Recommend (1) for the public feed and keep `/api/link-preview` available for
|
||||
(2) only on admin previews.
|
||||
|
||||
### Rendering
|
||||
|
||||
- New component: `src/components/messageStream/LinkPreviewCard.tsx`
|
||||
- Renders a card with a left vertical 3px accent bar (`themeColor` →
|
||||
fallback `bg-ark-gold`).
|
||||
- Layout:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ ▍ siteName (12px, neutral-400) │
|
||||
│ ▍ Title (15px, bold, neutral-100) │
|
||||
│ ▍ Description (13px, neutral-300, 3-line clamp) │
|
||||
│ ▍ ┌────────────────────────────────────────────┐ │
|
||||
│ ▍ │ imageUrl (lazy, aspect-video, rounded) │ │
|
||||
│ ▍ └────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- Whole card is `<a href={canonicalUrl} target="_blank" rel="noopener noreferrer">`.
|
||||
- Reuse the bubble background (`bg-[#272632]` is OK, slightly lift with
|
||||
`bg-white/[0.03]` overlay so the card reads as inset within the bubble).
|
||||
- Mount points (text-bearing bubbles only): `TextBubble`,
|
||||
`ImageWithTextBubble`, `AlbumBubble`, `VideoBubble`, `FileDocBubble`.
|
||||
Render below the existing `CollapsibleText` so cards stay visible even
|
||||
when long text is collapsed.
|
||||
|
||||
### Picking the URL to preview
|
||||
|
||||
If `post.linkPreview` is present, render that single card. Otherwise the
|
||||
bubble renders nothing extra (URLs still autolink inline). The front-end
|
||||
never picks the URL itself — that decision lives on the back-end so the
|
||||
client and server agree on which URL was chosen.
|
||||
|
||||
### Falling back gracefully
|
||||
|
||||
- No `imageUrl` → omit the image area, keep the text block.
|
||||
- Title shorter than 8 characters → hide the description below (treat as
|
||||
a low-confidence preview).
|
||||
- Title empty and description empty → render nothing.
|
||||
|
||||
---
|
||||
|
||||
## Open questions for the back-end
|
||||
|
||||
- Where in the stack will OG extraction live? Existing post pipeline, a
|
||||
worker queue, or inline on read?
|
||||
- Storage: a new `link_previews` table keyed by `canonicalUrl`, with a
|
||||
`post_link_previews` join table preserving original URL order, or just a
|
||||
JSON column on `posts`?
|
||||
- How aggressive should re-scrape be? E.g. re-scrape every 30 days for
|
||||
successful previews, every 24 hours for `themeColor` updates.
|
||||
- Should admin be able to override / hide a preview per post? Telegram has
|
||||
a "no preview" toggle and editors often want it.
|
||||
- Do we want a manual "refresh preview" button in the admin UI?
|
||||
Reference in New Issue
Block a user