Adds the front-end side of the link-preview feature so the back-end team has a fixed contract to implement against. - docs/link-preview.md: full spec for the `/api/link-preview` proxy and the preferred inline-on-Post integration. Covers caching, SSRF guards, metadata-extraction precedence, provider quirks, and the front-end rendering rules. Scope is the first URL only. - types/post.ts: new `LinkPreview` type and optional `linkPreview` field on `Post`. - LinkPreviewCard: clickable card with a themeColor accent bar, siteName / title / description (line-clamped), and an optional 1.91:1 thumbnail. Whole card is an `<a target="_blank">` to the canonical URL. - MessageBubble: render the card between the bubble body and the timestamp, with padding that matches visual vs. text-only bubbles. - mockPosts: example `linkPreview` payloads on p-005 and p-010 so the visual works when running with VITE_USE_MOCK_POSTS=true, and so the back-end has concrete reference values.
12 KiB
Link preview (/api/link-preview)
Telegram-style rich card for the first URL found in a post's text. Front-end renders a single clickable card showing site name, title, description, and a thumbnail; the data is fetched from a back-end proxy that scrapes Open Graph / oEmbed / Twitter Card metadata once and caches it.
Scope: only the first link in the post text gets a preview, matching Telegram's behaviour. Any additional URLs in the same post still render as inline autolinks but do not get their own card.
Why a back-end proxy
Browsers cannot fetch arbitrary cross-origin pages, so OG metadata must be fetched server-side. A single proxy endpoint keeps secrets / outbound IPs on the server and lets us cache so each URL is only scraped once for the whole audience.
Endpoint contract
GET /api/link-preview?url=<encoded-absolute-url>
| Query | Required | Notes |
|---|---|---|
url |
yes | Absolute http:// or https:// URL. Must be URI encoded so query strings inside the target URL survive the round trip. |
Success — 200 OK
{
"url": "https://app.safe.global/welcome",
"canonicalUrl": "https://app.safe.global/welcome",
"siteName": "app.safe.global",
"title": "Safe{Wallet}",
"description": "Safe{Wallet} is the most trusted smart account wallet on Ethereum with over $100B secured.",
"imageUrl": "https://app.safe.global/og.png",
"imageWidth": 1200,
"imageHeight": 630,
"favicon": "https://app.safe.global/favicon.ico",
"themeColor": "#12FF80",
"fetchedAt": "2026-05-29T10:00:00Z",
"cacheTtlSeconds": 86400
}
- All string fields except
urlmay be empty. The front-end gracefully hides rows that are missing (e.g. noimageUrl→ image area is omitted). urlechoes the original input so the client can match the response against the URL it asked about, even if the request was racy.canonicalUrlis the URL the client should open when the card is tapped. Defaults tourlif no<link rel=canonical>was found.
Already cached / freshly cached — same shape
The endpoint is idempotent and the response shape is identical whether the metadata is hot, warm, or freshly scraped.
Errors
| Status | When | Body shape |
|---|---|---|
400 |
Missing / invalid / non-http(s) url |
{ "error": "invalid_url" } |
422 |
URL passed validation but resolves to a private/internal address (SSRF guard) | { "error": "blocked_target" } |
404 |
Target returned 404 or fetch produced no metadata | { "error": "not_found" } |
408 |
Target took longer than the timeout to respond | { "error": "timeout" } |
502 |
Target returned 5xx | { "error": "upstream_error" } |
429 |
Rate limit on this client / IP | { "error": "rate_limited", "retryAfter": 60 } |
The front-end treats every non-200 as “no preview available” and
silently renders nothing. No toasts. URLs already render as inline
clickable text via autolink, so the user is never blocked.
Caching strategy
Store one row per canonicalUrl (or normalized url if canonicalUrl is
absent). Suggested TTLs:
- Successful preview: 24 hours (
cacheTtlSeconds: 86400). - 404 / timeout / blocked: 6 hours negative cache. Otherwise transient failures on the target site will hammer the proxy.
- Send
Cache-Control: public, max-age=86400so CDN / browser also cache.
Cache key normalization:
- Lowercase scheme + host.
- Strip the trailing slash on the path when it's the only character.
- Strip
utm_*,ref,referrer,fbclid,gclidquery params. - Keep the rest of the query and fragment as-is.
SSRF and abuse guard (must-have)
The proxy will fetch any URL the front-end asks about, which is dangerous. Before issuing the outbound request:
- Resolve the host to all of its A/AAAA records.
- Reject if any resolved IP is in: loopback, link-local, private
(RFC1918),
0.0.0.0/8, multicast, broadcast, or the internal cluster CIDR. - Reject schemes other than
httpandhttps. - Cap response body at 5 MB; abort on overflow.
- Cap request total time at 5 s; abort on timeout.
- Cap redirect chain at 3 hops; re-validate target IP at each hop.
- Do not forward client cookies, auth headers, or
Refererto the target. - Use a clear
User-Agentsuch asArkLibraryLinkBot/1.0 (+https://ark-library.com/bot). - Per-client (IP or session) rate limit, e.g. 60 req / min.
Metadata extraction precedence
For each field, pick the first present:
| Field | Sources (in order) |
|---|---|
title |
og:title → twitter:title → <title> → empty |
description |
og:description → twitter:description → <meta name="description"> → empty |
imageUrl |
og:image:secure_url → og:image → twitter:image → first prominent <img> (skip if <200×200) → empty |
siteName |
og:site_name → application-name → hostname (sans www.) |
canonicalUrl |
<link rel="canonical"> → request URL |
favicon |
<link rel="icon"> → <link rel="shortcut icon"> → /favicon.ico |
themeColor |
<meta name="theme-color"> |
Resolve any relative URLs (og:image, favicon, canonical) against the
final response URL (after redirects).
Provider quirks worth handling
These quirks save a lot of "why doesn't this site preview?" debugging later.
- Twitter / X:
x.comandtwitter.comstrip OG when not signed in. Use the public oEmbed endpointhttps://publish.twitter.com/oembed?url=...&omit_script=1for Twitter/X URLs and map:title = author_name,description = htmlstripped to text,imageUrl = thumbnail_urlif available. - YouTube: prefer
https://noembed.com/embed?url=...orhttps://www.youtube.com/oembed?url=...&format=json(no key). - Reddit / Mastodon: standard OG works fine.
- Sites behind Cloudflare bot challenge: surface 502 to the client. Don't retry hot — let the negative-cache TTL absorb it.
- AMP pages: prefer
og:urlwhen present so the cached entry points to the canonical page, not the AMP variant.
Front-end integration
Type addition (src/types/post.ts)
export type LinkPreview = {
url: string;
canonicalUrl: string;
siteName: string;
title: string;
description: string;
imageUrl?: string;
imageWidth?: number;
imageHeight?: number;
favicon?: string;
themeColor?: string;
};
export type Post = {
// ...existing fields
/** Preview for the first URL in `text`. At most one per post. */
linkPreview?: LinkPreview;
};
Which URL gets previewed
The back-end picks the first URL it finds in text using the same
regex as the front-end's autolink (/(https?:\/\/[^\s<>"]+[^\s<>".,;:!?)\]}'])/i).
Only that URL is fetched, stored, and returned as post.linkPreview. Any
later URLs in the same post are ignored for preview purposes (still
clickable inline via autolink).
Where data comes from
Two viable paths — pick one when wiring the back-end.
- Inline on
Post(preferred): the post API enriches each post withlinkPreview. The first URL intextis resolved once at write time (or lazily on first read with a background job). The client renders without making any extra request. - Client-side lookup: the client extracts the first URL via the
existing
autolinkregex, calls/api/link-preview?url=...once per post (with in-memory dedupe across posts that share the same URL), and renders the card when the response comes back. Slower first paint but keeps the posts endpoint cheap.
Recommend (1) for the public feed and keep /api/link-preview available for
(2) only on admin previews.
Rendering
-
New component:
src/components/messageStream/LinkPreviewCard.tsx -
Renders a card with a left vertical 3px accent bar (
themeColor→ fallbackbg-ark-gold). -
Layout:
┌──────────────────────────────────────────────────┐ │ ▍ siteName (12px, neutral-400) │ │ ▍ Title (15px, bold, neutral-100) │ │ ▍ Description (13px, neutral-300, 3-line clamp) │ │ ▍ ┌────────────────────────────────────────────┐ │ │ ▍ │ imageUrl (lazy, aspect-video, rounded) │ │ │ ▍ └────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────┘ -
Whole card is
<a href={canonicalUrl} target="_blank" rel="noopener noreferrer">. -
Reuse the bubble background (
bg-[#272632]is OK, slightly lift withbg-white/[0.03]overlay so the card reads as inset within the bubble). -
Mount points (text-bearing bubbles only):
TextBubble,ImageWithTextBubble,AlbumBubble,VideoBubble,FileDocBubble. Render below the existingCollapsibleTextso cards stay visible even when long text is collapsed.
Picking the URL to preview
If post.linkPreview is present, render that single card. Otherwise the
bubble renders nothing extra (URLs still autolink inline). The front-end
never picks the URL itself — that decision lives on the back-end so the
client and server agree on which URL was chosen.
Falling back gracefully
- No
imageUrl→ omit the image area, keep the text block. - Title shorter than 8 characters → hide the description below (treat as a low-confidence preview).
- Title empty and description empty → render nothing.
Open questions for the back-end
- Where in the stack will OG extraction live? Existing post pipeline, a worker queue, or inline on read?
- Storage: a new
link_previewstable keyed bycanonicalUrl, with apost_link_previewsjoin table preserving original URL order, or just a JSON column onposts? - How aggressive should re-scrape be? E.g. re-scrape every 30 days for
successful previews, every 24 hours for
themeColorupdates. - Should admin be able to override / hide a preview per post? Telegram has a "no preview" toggle and editors often want it.
- Do we want a manual "refresh preview" button in the admin UI?