Adds the front-end side of the link-preview feature so the back-end team has a fixed contract to implement against. - docs/link-preview.md: full spec for the `/api/link-preview` proxy and the preferred inline-on-Post integration. Covers caching, SSRF guards, metadata-extraction precedence, provider quirks, and the front-end rendering rules. Scope is the first URL only. - types/post.ts: new `LinkPreview` type and optional `linkPreview` field on `Post`. - LinkPreviewCard: clickable card with a themeColor accent bar, siteName / title / description (line-clamped), and an optional 1.91:1 thumbnail. Whole card is an `<a target="_blank">` to the canonical URL. - MessageBubble: render the card between the bubble body and the timestamp, with padding that matches visual vs. text-only bubbles. - mockPosts: example `linkPreview` payloads on p-005 and p-010 so the visual works when running with VITE_USE_MOCK_POSTS=true, and so the back-end has concrete reference values.
259 lines
12 KiB
Markdown
259 lines
12 KiB
Markdown
# Link preview (`/api/link-preview`)
|
||
|
||
Telegram-style rich card for the **first URL** found in a post's text.
|
||
Front-end renders a single clickable card showing site name, title,
|
||
description, and a thumbnail; the data is fetched from a back-end proxy
|
||
that scrapes Open Graph / oEmbed / Twitter Card metadata once and caches
|
||
it.
|
||
|
||
> **Scope**: only the first link in the post text gets a preview, matching
|
||
> Telegram's behaviour. Any additional URLs in the same post still render
|
||
> as inline autolinks but do not get their own card.
|
||
|
||
## Why a back-end proxy
|
||
|
||
Browsers cannot fetch arbitrary cross-origin pages, so OG metadata must be
|
||
fetched server-side. A single proxy endpoint keeps secrets / outbound IPs on
|
||
the server and lets us cache so each URL is only scraped once for the whole
|
||
audience.
|
||
|
||
---
|
||
|
||
## Endpoint contract
|
||
|
||
```
|
||
GET /api/link-preview?url=<encoded-absolute-url>
|
||
```
|
||
|
||
| Query | Required | Notes |
|
||
| ----- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `url` | yes | Absolute `http://` or `https://` URL. Must be `URI` encoded so query strings inside the target URL survive the round trip. |
|
||
|
||
### Success — `200 OK`
|
||
|
||
```json
|
||
{
|
||
"url": "https://app.safe.global/welcome",
|
||
"canonicalUrl": "https://app.safe.global/welcome",
|
||
"siteName": "app.safe.global",
|
||
"title": "Safe{Wallet}",
|
||
"description": "Safe{Wallet} is the most trusted smart account wallet on Ethereum with over $100B secured.",
|
||
"imageUrl": "https://app.safe.global/og.png",
|
||
"imageWidth": 1200,
|
||
"imageHeight": 630,
|
||
"favicon": "https://app.safe.global/favicon.ico",
|
||
"themeColor": "#12FF80",
|
||
"fetchedAt": "2026-05-29T10:00:00Z",
|
||
"cacheTtlSeconds": 86400
|
||
}
|
||
```
|
||
|
||
- All string fields except `url` may be empty. The front-end gracefully hides
|
||
rows that are missing (e.g. no `imageUrl` → image area is omitted).
|
||
- `url` echoes the original input so the client can match the response
|
||
against the URL it asked about, even if the request was racy.
|
||
- `canonicalUrl` is the URL the client should open when the card is tapped.
|
||
Defaults to `url` if no `<link rel=canonical>` was found.
|
||
|
||
### Already cached / freshly cached — same shape
|
||
|
||
The endpoint is idempotent and the response shape is identical whether
|
||
the metadata is hot, warm, or freshly scraped.
|
||
|
||
### Errors
|
||
|
||
| Status | When | Body shape |
|
||
| ------ | --------------------------------------------------- | --------------------------------------------------------------------------- |
|
||
| `400` | Missing / invalid / non-http(s) `url` | `{ "error": "invalid_url" }` |
|
||
| `422` | URL passed validation but resolves to a private/internal address (SSRF guard) | `{ "error": "blocked_target" }` |
|
||
| `404` | Target returned 404 or fetch produced no metadata | `{ "error": "not_found" }` |
|
||
| `408` | Target took longer than the timeout to respond | `{ "error": "timeout" }` |
|
||
| `502` | Target returned 5xx | `{ "error": "upstream_error" }` |
|
||
| `429` | Rate limit on this client / IP | `{ "error": "rate_limited", "retryAfter": 60 }` |
|
||
|
||
The front-end treats every non-`200` as “no preview available” and
|
||
silently renders nothing. No toasts. URLs already render as inline
|
||
clickable text via `autolink`, so the user is never blocked.
|
||
|
||
---
|
||
|
||
## Caching strategy
|
||
|
||
Store one row per `canonicalUrl` (or normalized `url` if `canonicalUrl` is
|
||
absent). Suggested TTLs:
|
||
|
||
- Successful preview: **24 hours** (`cacheTtlSeconds: 86400`).
|
||
- 404 / timeout / blocked: **6 hours** negative cache. Otherwise transient
|
||
failures on the target site will hammer the proxy.
|
||
- Send `Cache-Control: public, max-age=86400` so CDN / browser also cache.
|
||
|
||
Cache key normalization:
|
||
- Lowercase scheme + host.
|
||
- Strip the trailing slash on the path when it's the only character.
|
||
- Strip `utm_*`, `ref`, `referrer`, `fbclid`, `gclid` query params.
|
||
- Keep the rest of the query and fragment as-is.
|
||
|
||
---
|
||
|
||
## SSRF and abuse guard (must-have)
|
||
|
||
The proxy will fetch any URL the front-end asks about, which is dangerous.
|
||
Before issuing the outbound request:
|
||
|
||
1. Resolve the host to all of its A/AAAA records.
|
||
2. Reject if any resolved IP is in: loopback, link-local, private
|
||
(RFC1918), `0.0.0.0/8`, multicast, broadcast, or the internal cluster
|
||
CIDR.
|
||
3. Reject schemes other than `http` and `https`.
|
||
4. Cap response body at **5 MB**; abort on overflow.
|
||
5. Cap request total time at **5 s**; abort on timeout.
|
||
6. Cap redirect chain at **3 hops**; re-validate target IP at each hop.
|
||
7. Do not forward client cookies, auth headers, or `Referer` to the target.
|
||
8. Use a clear `User-Agent` such as `ArkLibraryLinkBot/1.0 (+https://ark-library.com/bot)`.
|
||
9. Per-client (IP or session) rate limit, e.g. 60 req / min.
|
||
|
||
---
|
||
|
||
## Metadata extraction precedence
|
||
|
||
For each field, pick the first present:
|
||
|
||
| Field | Sources (in order) |
|
||
| ------------- | -------------------------------------------------------------------------------------------------------- |
|
||
| `title` | `og:title` → `twitter:title` → `<title>` → empty |
|
||
| `description` | `og:description` → `twitter:description` → `<meta name="description">` → empty |
|
||
| `imageUrl` | `og:image:secure_url` → `og:image` → `twitter:image` → first prominent `<img>` (skip if <200×200) → empty |
|
||
| `siteName` | `og:site_name` → `application-name` → hostname (sans `www.`) |
|
||
| `canonicalUrl`| `<link rel="canonical">` → request URL |
|
||
| `favicon` | `<link rel="icon">` → `<link rel="shortcut icon">` → `/favicon.ico` |
|
||
| `themeColor` | `<meta name="theme-color">` |
|
||
|
||
Resolve any relative URLs (`og:image`, `favicon`, `canonical`) against the
|
||
final response URL (after redirects).
|
||
|
||
---
|
||
|
||
## Provider quirks worth handling
|
||
|
||
These quirks save a lot of "why doesn't this site preview?" debugging later.
|
||
|
||
- **Twitter / X**: `x.com` and `twitter.com` strip OG when not signed in. Use
|
||
the public oEmbed endpoint
|
||
`https://publish.twitter.com/oembed?url=...&omit_script=1` for
|
||
Twitter/X URLs and map: `title = author_name`, `description = html` stripped
|
||
to text, `imageUrl = thumbnail_url` if available.
|
||
- **YouTube**: prefer `https://noembed.com/embed?url=...` or
|
||
`https://www.youtube.com/oembed?url=...&format=json` (no key).
|
||
- **Reddit / Mastodon**: standard OG works fine.
|
||
- **Sites behind Cloudflare bot challenge**: surface 502 to the client.
|
||
Don't retry hot — let the negative-cache TTL absorb it.
|
||
- **AMP pages**: prefer `og:url` when present so the cached entry points to
|
||
the canonical page, not the AMP variant.
|
||
|
||
---
|
||
|
||
## Front-end integration
|
||
|
||
### Type addition (`src/types/post.ts`)
|
||
|
||
```ts
|
||
export type LinkPreview = {
|
||
url: string;
|
||
canonicalUrl: string;
|
||
siteName: string;
|
||
title: string;
|
||
description: string;
|
||
imageUrl?: string;
|
||
imageWidth?: number;
|
||
imageHeight?: number;
|
||
favicon?: string;
|
||
themeColor?: string;
|
||
};
|
||
|
||
export type Post = {
|
||
// ...existing fields
|
||
/** Preview for the first URL in `text`. At most one per post. */
|
||
linkPreview?: LinkPreview;
|
||
};
|
||
```
|
||
|
||
### Which URL gets previewed
|
||
|
||
The back-end picks the **first** URL it finds in `text` using the same
|
||
regex as the front-end's `autolink` (`/(https?:\/\/[^\s<>"]+[^\s<>".,;:!?)\]}'])/i`).
|
||
Only that URL is fetched, stored, and returned as `post.linkPreview`. Any
|
||
later URLs in the same post are ignored for preview purposes (still
|
||
clickable inline via `autolink`).
|
||
|
||
### Where data comes from
|
||
|
||
Two viable paths — pick one when wiring the back-end.
|
||
|
||
1. **Inline on `Post`** (preferred): the post API enriches each post with
|
||
`linkPreview`. The first URL in `text` is resolved once at write time
|
||
(or lazily on first read with a background job). The client renders
|
||
without making any extra request.
|
||
2. **Client-side lookup**: the client extracts the first URL via the
|
||
existing `autolink` regex, calls `/api/link-preview?url=...` once per
|
||
post (with in-memory dedupe across posts that share the same URL), and
|
||
renders the card when the response comes back. Slower first paint but
|
||
keeps the posts endpoint cheap.
|
||
|
||
Recommend (1) for the public feed and keep `/api/link-preview` available for
|
||
(2) only on admin previews.
|
||
|
||
### Rendering
|
||
|
||
- New component: `src/components/messageStream/LinkPreviewCard.tsx`
|
||
- Renders a card with a left vertical 3px accent bar (`themeColor` →
|
||
fallback `bg-ark-gold`).
|
||
- Layout:
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────┐
|
||
│ ▍ siteName (12px, neutral-400) │
|
||
│ ▍ Title (15px, bold, neutral-100) │
|
||
│ ▍ Description (13px, neutral-300, 3-line clamp) │
|
||
│ ▍ ┌────────────────────────────────────────────┐ │
|
||
│ ▍ │ imageUrl (lazy, aspect-video, rounded) │ │
|
||
│ ▍ └────────────────────────────────────────────┘ │
|
||
└──────────────────────────────────────────────────┘
|
||
```
|
||
|
||
- Whole card is `<a href={canonicalUrl} target="_blank" rel="noopener noreferrer">`.
|
||
- Reuse the bubble background (`bg-[#272632]` is OK, slightly lift with
|
||
`bg-white/[0.03]` overlay so the card reads as inset within the bubble).
|
||
- Mount points (text-bearing bubbles only): `TextBubble`,
|
||
`ImageWithTextBubble`, `AlbumBubble`, `VideoBubble`, `FileDocBubble`.
|
||
Render below the existing `CollapsibleText` so cards stay visible even
|
||
when long text is collapsed.
|
||
|
||
### Picking the URL to preview
|
||
|
||
If `post.linkPreview` is present, render that single card. Otherwise the
|
||
bubble renders nothing extra (URLs still autolink inline). The front-end
|
||
never picks the URL itself — that decision lives on the back-end so the
|
||
client and server agree on which URL was chosen.
|
||
|
||
### Falling back gracefully
|
||
|
||
- No `imageUrl` → omit the image area, keep the text block.
|
||
- Title shorter than 8 characters → hide the description below (treat as
|
||
a low-confidence preview).
|
||
- Title empty and description empty → render nothing.
|
||
|
||
---
|
||
|
||
## Open questions for the back-end
|
||
|
||
- Where in the stack will OG extraction live? Existing post pipeline, a
|
||
worker queue, or inline on read?
|
||
- Storage: a new `link_previews` table keyed by `canonicalUrl`, with a
|
||
`post_link_previews` join table preserving original URL order, or just a
|
||
JSON column on `posts`?
|
||
- How aggressive should re-scrape be? E.g. re-scrape every 30 days for
|
||
successful previews, every 24 hours for `themeColor` updates.
|
||
- Should admin be able to override / hide a preview per post? Telegram has
|
||
a "no preview" toggle and editors often want it.
|
||
- Do we want a manual "refresh preview" button in the admin UI?
|