Files
Arkie-Library-Frontend/docs/link-preview.md
TerryM 29dc71d2dd feat(link-preview): frontend interface for Telegram-style URL preview
Adds the front-end side of the link-preview feature so the back-end
team has a fixed contract to implement against.

- docs/link-preview.md: full spec for the `/api/link-preview` proxy
  and the preferred inline-on-Post integration. Covers caching, SSRF
  guards, metadata-extraction precedence, provider quirks, and the
  front-end rendering rules. Scope is the first URL only.
- types/post.ts: new `LinkPreview` type and optional `linkPreview`
  field on `Post`.
- LinkPreviewCard: clickable card with a themeColor accent bar,
  siteName / title / description (line-clamped), and an optional
  1.91:1 thumbnail. Whole card is an `<a target="_blank">` to the
  canonical URL.
- MessageBubble: render the card between the bubble body and the
  timestamp, with padding that matches visual vs. text-only bubbles.
- mockPosts: example `linkPreview` payloads on p-005 and p-010 so
  the visual works when running with VITE_USE_MOCK_POSTS=true,
  and so the back-end has concrete reference values.
2026-05-30 01:40:00 +08:00

12 KiB
Raw Permalink Blame History

Link preview (/api/link-preview)

Telegram-style rich card for the first URL found in a post's text. Front-end renders a single clickable card showing site name, title, description, and a thumbnail; the data is fetched from a back-end proxy that scrapes Open Graph / oEmbed / Twitter Card metadata once and caches it.

Scope: only the first link in the post text gets a preview, matching Telegram's behaviour. Any additional URLs in the same post still render as inline autolinks but do not get their own card.

Why a back-end proxy

Browsers cannot fetch arbitrary cross-origin pages, so OG metadata must be fetched server-side. A single proxy endpoint keeps secrets / outbound IPs on the server and lets us cache so each URL is only scraped once for the whole audience.


Endpoint contract

GET /api/link-preview?url=<encoded-absolute-url>
Query Required Notes
url yes Absolute http:// or https:// URL. Must be URI encoded so query strings inside the target URL survive the round trip.

Success — 200 OK

{
  "url": "https://app.safe.global/welcome",
  "canonicalUrl": "https://app.safe.global/welcome",
  "siteName": "app.safe.global",
  "title": "Safe{Wallet}",
  "description": "Safe{Wallet} is the most trusted smart account wallet on Ethereum with over $100B secured.",
  "imageUrl": "https://app.safe.global/og.png",
  "imageWidth": 1200,
  "imageHeight": 630,
  "favicon": "https://app.safe.global/favicon.ico",
  "themeColor": "#12FF80",
  "fetchedAt": "2026-05-29T10:00:00Z",
  "cacheTtlSeconds": 86400
}
  • All string fields except url may be empty. The front-end gracefully hides rows that are missing (e.g. no imageUrl → image area is omitted).
  • url echoes the original input so the client can match the response against the URL it asked about, even if the request was racy.
  • canonicalUrl is the URL the client should open when the card is tapped. Defaults to url if no <link rel=canonical> was found.

Already cached / freshly cached — same shape

The endpoint is idempotent and the response shape is identical whether the metadata is hot, warm, or freshly scraped.

Errors

Status When Body shape
400 Missing / invalid / non-http(s) url { "error": "invalid_url" }
422 URL passed validation but resolves to a private/internal address (SSRF guard) { "error": "blocked_target" }
404 Target returned 404 or fetch produced no metadata { "error": "not_found" }
408 Target took longer than the timeout to respond { "error": "timeout" }
502 Target returned 5xx { "error": "upstream_error" }
429 Rate limit on this client / IP { "error": "rate_limited", "retryAfter": 60 }

The front-end treats every non-200 as “no preview available” and silently renders nothing. No toasts. URLs already render as inline clickable text via autolink, so the user is never blocked.


Caching strategy

Store one row per canonicalUrl (or normalized url if canonicalUrl is absent). Suggested TTLs:

  • Successful preview: 24 hours (cacheTtlSeconds: 86400).
  • 404 / timeout / blocked: 6 hours negative cache. Otherwise transient failures on the target site will hammer the proxy.
  • Send Cache-Control: public, max-age=86400 so CDN / browser also cache.

Cache key normalization:

  • Lowercase scheme + host.
  • Strip the trailing slash on the path when it's the only character.
  • Strip utm_*, ref, referrer, fbclid, gclid query params.
  • Keep the rest of the query and fragment as-is.

SSRF and abuse guard (must-have)

The proxy will fetch any URL the front-end asks about, which is dangerous. Before issuing the outbound request:

  1. Resolve the host to all of its A/AAAA records.
  2. Reject if any resolved IP is in: loopback, link-local, private (RFC1918), 0.0.0.0/8, multicast, broadcast, or the internal cluster CIDR.
  3. Reject schemes other than http and https.
  4. Cap response body at 5 MB; abort on overflow.
  5. Cap request total time at 5 s; abort on timeout.
  6. Cap redirect chain at 3 hops; re-validate target IP at each hop.
  7. Do not forward client cookies, auth headers, or Referer to the target.
  8. Use a clear User-Agent such as ArkLibraryLinkBot/1.0 (+https://ark-library.com/bot).
  9. Per-client (IP or session) rate limit, e.g. 60 req / min.

Metadata extraction precedence

For each field, pick the first present:

Field Sources (in order)
title og:titletwitter:title<title> → empty
description og:descriptiontwitter:description<meta name="description"> → empty
imageUrl og:image:secure_urlog:imagetwitter:image → first prominent <img> (skip if <200×200) → empty
siteName og:site_nameapplication-name → hostname (sans www.)
canonicalUrl <link rel="canonical"> → request URL
favicon <link rel="icon"><link rel="shortcut icon">/favicon.ico
themeColor <meta name="theme-color">

Resolve any relative URLs (og:image, favicon, canonical) against the final response URL (after redirects).


Provider quirks worth handling

These quirks save a lot of "why doesn't this site preview?" debugging later.

  • Twitter / X: x.com and twitter.com strip OG when not signed in. Use the public oEmbed endpoint https://publish.twitter.com/oembed?url=...&omit_script=1 for Twitter/X URLs and map: title = author_name, description = html stripped to text, imageUrl = thumbnail_url if available.
  • YouTube: prefer https://noembed.com/embed?url=... or https://www.youtube.com/oembed?url=...&format=json (no key).
  • Reddit / Mastodon: standard OG works fine.
  • Sites behind Cloudflare bot challenge: surface 502 to the client. Don't retry hot — let the negative-cache TTL absorb it.
  • AMP pages: prefer og:url when present so the cached entry points to the canonical page, not the AMP variant.

Front-end integration

Type addition (src/types/post.ts)

export type LinkPreview = {
  url: string;
  canonicalUrl: string;
  siteName: string;
  title: string;
  description: string;
  imageUrl?: string;
  imageWidth?: number;
  imageHeight?: number;
  favicon?: string;
  themeColor?: string;
};

export type Post = {
  // ...existing fields
  /** Preview for the first URL in `text`. At most one per post. */
  linkPreview?: LinkPreview;
};

Which URL gets previewed

The back-end picks the first URL it finds in text using the same regex as the front-end's autolink (/(https?:\/\/[^\s<>"]+[^\s<>".,;:!?)\]}'])/i). Only that URL is fetched, stored, and returned as post.linkPreview. Any later URLs in the same post are ignored for preview purposes (still clickable inline via autolink).

Where data comes from

Two viable paths — pick one when wiring the back-end.

  1. Inline on Post (preferred): the post API enriches each post with linkPreview. The first URL in text is resolved once at write time (or lazily on first read with a background job). The client renders without making any extra request.
  2. Client-side lookup: the client extracts the first URL via the existing autolink regex, calls /api/link-preview?url=... once per post (with in-memory dedupe across posts that share the same URL), and renders the card when the response comes back. Slower first paint but keeps the posts endpoint cheap.

Recommend (1) for the public feed and keep /api/link-preview available for (2) only on admin previews.

Rendering

  • New component: src/components/messageStream/LinkPreviewCard.tsx

  • Renders a card with a left vertical 3px accent bar (themeColor → fallback bg-ark-gold).

  • Layout:

    ┌──────────────────────────────────────────────────┐
    │ ▍ siteName (12px, neutral-400)                   │
    │ ▍ Title (15px, bold, neutral-100)                │
    │ ▍ Description (13px, neutral-300, 3-line clamp)  │
    │ ▍ ┌────────────────────────────────────────────┐ │
    │ ▍ │ imageUrl (lazy, aspect-video, rounded)     │ │
    │ ▍ └────────────────────────────────────────────┘ │
    └──────────────────────────────────────────────────┘
    
  • Whole card is <a href={canonicalUrl} target="_blank" rel="noopener noreferrer">.

  • Reuse the bubble background (bg-[#272632] is OK, slightly lift with bg-white/[0.03] overlay so the card reads as inset within the bubble).

  • Mount points (text-bearing bubbles only): TextBubble, ImageWithTextBubble, AlbumBubble, VideoBubble, FileDocBubble. Render below the existing CollapsibleText so cards stay visible even when long text is collapsed.

Picking the URL to preview

If post.linkPreview is present, render that single card. Otherwise the bubble renders nothing extra (URLs still autolink inline). The front-end never picks the URL itself — that decision lives on the back-end so the client and server agree on which URL was chosen.

Falling back gracefully

  • No imageUrl → omit the image area, keep the text block.
  • Title shorter than 8 characters → hide the description below (treat as a low-confidence preview).
  • Title empty and description empty → render nothing.

Open questions for the back-end

  • Where in the stack will OG extraction live? Existing post pipeline, a worker queue, or inline on read?
  • Storage: a new link_previews table keyed by canonicalUrl, with a post_link_previews join table preserving original URL order, or just a JSON column on posts?
  • How aggressive should re-scrape be? E.g. re-scrape every 30 days for successful previews, every 24 hours for themeColor updates.
  • Should admin be able to override / hide a preview per post? Telegram has a "no preview" toggle and editors often want it.
  • Do we want a manual "refresh preview" button in the admin UI?