{"id":3047,"date":"2026-07-01T15:50:41","date_gmt":"2026-07-01T12:50:41","guid":{"rendered":"https:\/\/shareai.now\/?p=3047"},"modified":"2026-07-01T15:50:42","modified_gmt":"2026-07-01T12:50:42","slug":"dinh-tuyen-bo-nho-dem-kv-llm-dien-truoc","status":"publish","type":"post","link":"https:\/\/shareai.now\/vi\/blog\/nha-phat-trien\/dinh-tuyen-bo-nho-dem-kv-llm-dien-truoc\/","title":{"rendered":"\u0110\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV: C\u1eaft gi\u1ea3m c\u00f4ng vi\u1ec7c \u0111i\u1ec1n tr\u01b0\u1edbc LLM d\u01b0 th\u1eeba"},"content":{"rendered":"<p>\u0110\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV r\u1ea5t quan tr\u1ecdng khi c\u00e1c ti\u1ec1n t\u1ed1 g\u1ee3i \u00fd l\u1eb7p l\u1ea1i li\u00ean t\u1ee5c xu\u1ea5t hi\u1ec7n trong l\u01b0u l\u01b0\u1ee3ng LLM c\u1ee7a b\u1ea1n. N\u1ebfu y\u00eau c\u1ea7u \u0111\u00fang \u0111\u1ebfn \u0111\u00fang b\u1ea3n sao, c\u00f4ng c\u1ee5 ph\u1ee5c v\u1ee5 c\u00f3 th\u1ec3 t\u00e1i s\u1eed d\u1ee5ng tr\u1ea1ng th\u00e1i ch\u00fa \u00fd \u0111\u00e3 \u0111\u01b0\u1ee3c l\u01b0u trong b\u1ed9 nh\u1edb \u0111\u1ec7m thay v\u00ec t\u00ednh to\u00e1n l\u1ea1i c\u00e1c token \u0111i\u1ec1n tr\u01b0\u1edbc gi\u1ed1ng nhau nhi\u1ec1u l\u1ea7n.<\/p>\n\n\n\n<p>\u0110i\u1ec1u \u0111\u00f3 nghe c\u00f3 v\u1ebb nh\u01b0 m\u1ed9t chi ti\u1ebft h\u1ea1 t\u1ea7ng, nh\u01b0ng n\u00f3 nhanh ch\u00f3ng tr\u1edf th\u00e0nh m\u1ed9t v\u1ea5n \u0111\u1ec1 s\u1ea3n ph\u1ea9m. C\u00e1c g\u1ee3i \u00fd h\u1ec7 th\u1ed1ng d\u00e0i, ng\u1eef c\u1ea3nh RAG, v\u00ed d\u1ee5 few-shot, v\u00e0 l\u1ecbch s\u1eed tr\u00f2 chuy\u1ec7n nhi\u1ec1u l\u01b0\u1ee3t c\u00f3 th\u1ec3 l\u00e0m cho c\u00f4ng vi\u1ec7c \u0111i\u1ec1n tr\u01b0\u1edbc tr\u1edf n\u00ean \u0111\u1eaft \u0111\u1ecf. Khi m\u1ed7i b\u1ea3n sao t\u00ednh to\u00e1n l\u1ea1i c\u00f9ng m\u1ed9t ti\u1ec1n t\u1ed1, c\u00e1c nh\u00f3m ph\u1ea3i tr\u1ea3 gi\u00e1 b\u1eb1ng \u0111\u1ed9 tr\u1ec5, th\u1eddi gian GPU, v\u00e0 l\u1eadp k\u1ebf ho\u1ea1ch dung l\u01b0\u1ee3ng.<\/p>\n\n\n\n<p>ShareAI cung c\u1ea5p cho c\u00e1c nh\u00e0 ph\u00e1t tri\u1ec3n m\u1ed9t API cho h\u01a1n 150+ m\u00f4 h\u00ecnh, kh\u1ea3 n\u0103ng hi\u1ec3n th\u1ecb th\u1ecb tr\u01b0\u1eddng, \u0111\u1ecbnh tuy\u1ebfn, v\u00e0 chuy\u1ec3n \u0111\u1ed5i d\u1ef1 ph\u00f2ng. \u0110\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV n\u1eb1m \u1edf m\u1ed9t l\u1edbp th\u1ea5p h\u01a1n, b\u00ean trong h\u1ea1 t\u1ea7ng ph\u1ee5c v\u1ee5 m\u00f4 h\u00ecnh. \u0110i\u1ec3m r\u00fat ra h\u1eefu \u00edch cho \u0111\u1ed9c gi\u1ea3 ShareAI r\u1ea5t \u0111\u01a1n gi\u1ea3n: c\u00e1c quy\u1ebft \u0111\u1ecbnh \u0111\u1ecbnh tuy\u1ebfn quan tr\u1ecdng \u1edf m\u1ecdi l\u1edbp c\u1ee7a ng\u0103n x\u1ebfp AI, t\u1eeb vi\u1ec7c ch\u1ecdn m\u00f4 h\u00ecnh \u0111\u1ebfn vi\u1ec7c b\u1ea3n sao GPU n\u00e0o x\u1eed l\u00fd m\u1ed9t g\u1ee3i \u00fd l\u1eb7p l\u1ea1i.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">T\u1ea1i sao \u0110\u1ecbnh tuy\u1ebfn B\u1ed9 nh\u1edb \u0111\u1ec7m KV Quan tr\u1ecdng<\/h2>\n\n\n\n<p>Trong qu\u00e1 tr\u00ecnh suy lu\u1eadn LLM, m\u1ed9t m\u00f4 h\u00ecnh \u0111\u1ea7u ti\u00ean x\u1eed l\u00fd g\u1ee3i \u00fd \u0111\u1ea7u v\u00e0o trong giai \u0111o\u1ea1n \u0111i\u1ec1n tr\u01b0\u1edbc. N\u00f3 x\u00e2y d\u1ef1ng m\u1ed9t b\u1ed9 nh\u1edb \u0111\u1ec7m key-value, th\u01b0\u1eddng \u0111\u01b0\u1ee3c g\u1ecdi l\u00e0 b\u1ed9 nh\u1edb \u0111\u1ec7m KV, \u0111\u1ec3 c\u00e1c token \u0111\u01b0\u1ee3c t\u1ea1o sau \u0111\u00f3 c\u00f3 th\u1ec3 tham chi\u1ebfu l\u1ea1i ng\u1eef c\u1ea3nh \u0111\u00e3 \u0111\u01b0\u1ee3c x\u1eed l\u00fd.<\/p>\n\n\n\n<p>B\u1ed9 nh\u1edb \u0111\u1ec7m ti\u1ec1n t\u1ed1 cho ph\u00e9p c\u00e1c c\u00f4ng c\u1ee5 ph\u1ee5c v\u1ee5 t\u00e1i s\u1eed d\u1ee5ng b\u1ed9 nh\u1edb \u0111\u1ec7m \u0111\u00f3 khi m\u1ed9t y\u00eau c\u1ea7u sau n\u00e0y chia s\u1ebb c\u00f9ng m\u1ed9t ph\u1ea7n \u0111\u1ea7u c\u1ee7a g\u1ee3i \u00fd. <a href=\"https:\/\/docs.vllm.ai\/en\/v0.18.1\/features\/automatic_prefix_caching\/?utm_source=shareai.now&#038;utm_medium=content&#038;utm_campaign=kv-cache-routing-llm-prefill\">T\u00e0i li\u1ec7u t\u1ef1 \u0111\u1ed9ng b\u1ed9 nh\u1edb \u0111\u1ec7m ti\u1ec1n t\u1ed1 c\u1ee7a vLLM<\/a> m\u00f4 t\u1ea3 \u0111i\u1ec1u n\u00e0y nh\u01b0 vi\u1ec7c t\u00e1i s\u1eed d\u1ee5ng b\u1ed9 nh\u1edb \u0111\u1ec7m KV cho c\u00e1c ti\u1ec1n t\u1ed1 \u0111\u01b0\u1ee3c chia s\u1ebb \u0111\u1ec3 y\u00eau c\u1ea7u m\u1edbi c\u00f3 th\u1ec3 b\u1ecf qua vi\u1ec7c t\u00ednh to\u00e1n cho ph\u1ea7n \u0111\u01b0\u1ee3c chia s\u1ebb. <a href=\"https:\/\/sgl-project-sglang-93.mintlify.app\/concepts\/prefix-caching?utm_source=shareai.now&#038;utm_medium=content&#038;utm_campaign=kv-cache-routing-llm-prefill\">B\u1ed9 nh\u1edb \u0111\u1ec7m ti\u1ec1n t\u1ed1 SGLang<\/a> s\u1eed d\u1ee5ng m\u1ed9t \u00fd t\u01b0\u1edfng li\u00ean quan \u0111\u1ec3 chia s\u1ebb b\u1ed9 nh\u1edb \u0111\u1ec7m KV cho c\u00e1c chu\u1ed7i token chung.<\/p>\n\n\n\n<p>\u0110i\u1ec1u n\u00e0y \u0111\u1eb7c bi\u1ec7t quan tr\u1ecdng \u0111\u1ed1i v\u1edbi c\u00e1c kh\u1ed1i l\u01b0\u1ee3ng c\u00f4ng vi\u1ec7c m\u00e0 nhi\u1ec1u y\u00eau c\u1ea7u b\u1eaft \u0111\u1ea7u theo c\u00f9ng m\u1ed9t c\u00e1ch: c\u00e1c t\u00e1c nh\u00e2n h\u1ed7 tr\u1ee3 v\u1edbi m\u1ed9t g\u1ee3i \u00fd h\u1ec7 th\u1ed1ng l\u1edbn, c\u00e1c \u1ee9ng d\u1ee5ng RAG s\u1eed d\u1ee5ng c\u00e1c \u0111o\u1ea1n t\u00e0i li\u1ec7u l\u1eb7p l\u1ea1i, c\u00e1c t\u00e1c nh\u00e2n m\u00e3 h\u00f3a v\u1edbi h\u01b0\u1edbng d\u1eabn kho l\u01b0u tr\u1eef, ho\u1eb7c c\u00e1c s\u1ea3n ph\u1ea9m tr\u00f2 chuy\u1ec7n mang l\u1ecbch s\u1eed h\u1ed9i tho\u1ea1i qua c\u00e1c l\u01b0\u1ee3t.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Khi Round-Robin G\u1eb7p V\u1ea5n \u0110\u1ec1<\/h2>\n\n\n\n<p>B\u1ed9 nh\u1edb \u0111\u1ec7m ti\u1ec1n t\u1ed1 d\u1ec5 d\u00e0ng nh\u1ea5t tr\u00ean m\u1ed9t b\u1ea3n sao. C\u00f9ng m\u1ed9t quy tr\u00ecnh nh\u00ecn th\u1ea5y ti\u1ec1n t\u1ed1 l\u1eb7p l\u1ea1i v\u00e0 c\u00f3 th\u1ec3 t\u00e1i s\u1eed d\u1ee5ng b\u1ed9 nh\u1edb \u0111\u1ec7m c\u1ee7a n\u00f3 n\u1ebfu b\u1ed9 nh\u1edb kh\u1ea3 d\u1ee5ng. V\u1ea5n \u0111\u1ec1 xu\u1ea5t hi\u1ec7n khi d\u1ecbch v\u1ee5 m\u1edf r\u1ed9ng theo chi\u1ec1u ngang.<\/p>\n\n\n\n<p>V\u1edbi m\u1ed9t b\u1ed9 c\u00e2n b\u1eb1ng t\u1ea3i v\u00f2ng tr\u00f2n ti\u00eau chu\u1ea9n, y\u00eau c\u1ea7u \u0111\u1ea7u ti\u00ean c\u00f3 th\u1ec3 l\u00e0m n\u00f3ng b\u1ed9 nh\u1edb \u0111\u1ec7m tr\u00ean b\u1ea3n sao A, trong khi y\u00eau c\u1ea7u th\u1ee9 hai v\u1edbi c\u00f9ng ti\u1ec1n t\u1ed1 l\u1ea1i \u0111\u1ebfn b\u1ea3n sao B. B\u1ea3n sao B kh\u00f4ng c\u00f3 tr\u1ea1ng th\u00e1i \u0111\u00e3 \u0111\u01b0\u1ee3c l\u01b0u trong b\u1ed9 nh\u1edb \u0111\u1ec7m \u0111\u00f3, v\u00ec v\u1eady n\u00f3 t\u00ednh to\u00e1n l\u1ea1i c\u00f4ng vi\u1ec7c \u0111i\u1ec1n tr\u01b0\u1edbc gi\u1ed1ng nhau. Y\u00eau c\u1ea7u th\u1ee9 ba c\u00f3 th\u1ec3 \u0111\u1ebfn b\u1ea3n sao C v\u00e0 l\u1ea1i b\u1ecf l\u1ee1.<\/p>\n\n\n\n<p>Khi s\u1ed1 l\u01b0\u1ee3ng b\u1ea3n sao t\u0103ng l\u00ean, c\u00e2n b\u1eb1ng t\u1ea3i ng\u00e2y th\u01a1 c\u00f3 th\u1ec3 ph\u00e2n t\u00e1n c\u00e1c y\u00eau c\u1ea7u li\u00ean quan tr\u00ean nhi\u1ec1u m\u00e1y h\u01a1n. \u0110\u1ed9i ng\u0169 ph\u1ee5c v\u1ee5 m\u00f4 h\u00ecnh c\u00f3 th\u1ec3 tr\u00f4ng c\u00e2n b\u1eb1ng, nh\u01b0ng t\u1ef7 l\u1ec7 truy c\u1eadp b\u1ed9 nh\u1edb \u0111\u1ec7m ti\u1ec1n t\u1ed1 gi\u1ea3m. \u0110\u00f3 l\u00e0 kho\u1ea3ng c\u00e1ch m\u00e0 \u0111\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV c\u1ed1 g\u1eafng l\u1ea5p \u0111\u1ea7y.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ba C\u1ea5p \u0110\u1ed9 \u0110\u1ecbnh Tuy\u1ebfn Th\u1ef1c Ti\u1ec5n<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. \u0110\u1ecbnh Tuy\u1ebfn Theo Phi\u00ean<\/h3>\n\n\n\n<p>\u0110\u1ecbnh tuy\u1ebfn theo phi\u00ean chuy\u1ec3n h\u01b0\u1edbng l\u01b0u l\u01b0\u1ee3ng t\u1eeb c\u00f9ng m\u1ed9t ng\u01b0\u1eddi d\u00f9ng, kh\u00f4ng gian l\u00e0m vi\u1ec7c, kh\u00e1ch thu\u00ea, ho\u1eb7c cu\u1ed9c tr\u00f2 chuy\u1ec7n \u0111\u1ebfn c\u00f9ng m\u1ed9t b\u1ea3n sao. \u0110\u00e2y l\u00e0 c\u00e1ch \u0111\u01a1n gi\u1ea3n nh\u1ea5t \u0111\u1ec3 b\u1eaft \u0111\u1ea7u v\u1edbi tr\u00f2 chuy\u1ec7n nhi\u1ec1u l\u01b0\u1ee3t v\u00ec c\u00e1c l\u1eddi nh\u1eafc ti\u1ebfp theo th\u01b0\u1eddng chia s\u1ebb ng\u1eef c\u1ea3nh tr\u01b0\u1edbc \u0111\u00f3.<\/p>\n\n\n\n<p>S\u1ef1 \u0111\u00e1nh \u0111\u1ed5i l\u00e0 danh t\u00ednh ng\u01b0\u1eddi d\u00f9ng kh\u00f4ng ph\u1ea3i l\u00fac n\u00e0o c\u0169ng gi\u1ed1ng v\u1edbi s\u1ef1 t\u01b0\u01a1ng \u0111\u1ed3ng c\u1ee7a l\u1eddi nh\u1eafc. Hai ng\u01b0\u1eddi d\u00f9ng c\u00f3 th\u1ec3 chia s\u1ebb c\u00f9ng m\u1ed9t l\u1eddi nh\u1eafc h\u1ec7 th\u1ed1ng d\u00e0i v\u00e0 v\u1eabn \u0111\u01b0\u1ee3c \u0111\u1ecbnh tuy\u1ebfn \u0111\u1ebfn c\u00e1c b\u1ea3n sao kh\u00e1c nhau. \u0110\u1ecbnh tuy\u1ebfn theo phi\u00ean c\u0169ng c\u00f3 th\u1ec3 b\u1ecb gi\u00e1n \u0111o\u1ea1n khi th\u00eam ho\u1eb7c x\u00f3a c\u00e1c b\u1ea3n sao.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. \u0110\u1ecbnh Tuy\u1ebfn Theo B\u0103m Ti\u1ec1n T\u1ed1<\/h3>\n\n\n\n<p>\u0110\u1ecbnh tuy\u1ebfn theo b\u0103m ti\u1ec1n t\u1ed1 s\u1eed d\u1ee5ng ch\u00ednh l\u1eddi nh\u1eafc l\u00e0m kh\u00f3a \u0111\u1ecbnh tuy\u1ebfn. B\u1ed9 \u0111\u1ecbnh tuy\u1ebfn b\u0103m ph\u1ea7n \u0111\u1ea7u \u1ed5n \u0111\u1ecbnh c\u1ee7a l\u1eddi nh\u1eafc v\u00e0 g\u1eedi c\u00e1c ti\u1ec1n t\u1ed1 kh\u1edbp \u0111\u1ebfn c\u00f9ng m\u1ed9t b\u1ea3n sao.<\/p>\n\n\n\n<p>\u0110i\u1ec1u n\u00e0y ho\u1ea1t \u0111\u1ed9ng t\u1ed1t h\u01a1n khi c\u00e1c l\u1eddi nh\u1eafc h\u1ec7 th\u1ed1ng l\u1eb7p l\u1ea1i, v\u00ed d\u1ee5 \u00edt-shot, ho\u1eb7c ng\u1eef c\u1ea3nh \u0111\u01b0\u1ee3c truy xu\u1ea5t chia s\u1ebb quan tr\u1ecdng h\u01a1n danh t\u00ednh ng\u01b0\u1eddi d\u00f9ng. Ph\u1ea7n kh\u00f3 l\u00e0 ch\u1ecdn ranh gi\u1edbi ti\u1ec1n t\u1ed1. N\u1ebfu b\u0103m bao g\u1ed3m d\u1ea5u th\u1eddi gian, ID y\u00eau c\u1ea7u, ho\u1eb7c tr\u01b0\u1eddng c\u1ee5 th\u1ec3 c\u1ee7a ng\u01b0\u1eddi d\u00f9ng, kh\u00f3a \u0111\u1ecbnh tuy\u1ebfn s\u1ebd b\u1ecb ph\u00e2n m\u1ea3nh v\u00e0 vi\u1ec7c t\u00e1i s\u1eed d\u1ee5ng b\u1ed9 nh\u1edb \u0111\u1ec7m s\u1ebd b\u1ecb ph\u00e1 v\u1ee1.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. \u0110\u1ecbnh Tuy\u1ebfn Nh\u1eadn Th\u1ee9c S\u1ef1 Ki\u1ec7n B\u1ed9 Nh\u1edb \u0110\u1ec7m<\/h3>\n\n\n\n<p>Ph\u01b0\u01a1ng ph\u00e1p ti\u00ean ti\u1ebfn nh\u1ea5t theo d\u00f5i c\u00e1c kh\u1ed1i b\u1ed9 nh\u1edb \u0111\u1ec7m n\u00e0o \u0111ang c\u01b0 tr\u00fa tr\u00ean b\u1ea3n sao n\u00e0o, sau \u0111\u00f3 \u0111\u1ecbnh tuy\u1ebfn t\u1eebng y\u00eau c\u1ea7u \u0111\u1ebfn b\u1ea3n sao c\u00f3 s\u1ef1 tr\u00f9ng l\u1eb7p b\u1ed9 nh\u1edb \u0111\u1ec7m t\u1ed1t nh\u1ea5t trong khi v\u1eabn xem x\u00e9t t\u1ea3i. <a href=\"https:\/\/github.com\/llm-d\/llm-d-router?utm_source=shareai.now&#038;utm_medium=content&#038;utm_campaign=kv-cache-routing-llm-prefill\">D\u1ef1 \u00e1n b\u1ed9 \u0111\u1ecbnh tuy\u1ebfn llm-d<\/a> m\u00f4 t\u1ea3 m\u1ed9t b\u1ed9 ch\u1ecdn \u0111i\u1ec3m cu\u1ed1i xem x\u00e9t t\u00ednh c\u1ee5c b\u1ed9 c\u1ee7a b\u1ed9 nh\u1edb \u0111\u1ec7m KV, t\u1ea3i hi\u1ec7n t\u1ea1i, v\u00e0 m\u1ee9c \u0111\u1ed9 \u01b0u ti\u00ean khi ch\u1ecdn n\u01a1i y\u00eau c\u1ea7u n\u00ean \u0111\u01b0\u1ee3c g\u1eedi \u0111\u1ebfn.<\/p>\n\n\n\n<p>\u0110i\u1ec1u n\u00e0y ph\u1ee9c t\u1ea1p h\u01a1n, nh\u01b0ng \u0111\u00f3 l\u00e0 h\u01b0\u1edbng \u0111i \u0111\u00fang cho c\u00e1c \u0111\u1ed9i ng\u0169 c\u00f3 th\u00f4ng l\u01b0\u1ee3ng cao, n\u01a1i c\u00e1c l\u1ea7n b\u1ecf l\u1ee1 b\u1ed9 nh\u1edb \u0111\u1ec7m \u0111\u01b0\u1ee3c \u0111o l\u01b0\u1eddng, \u0111\u1eaft \u0111\u1ecf, v\u00e0 th\u01b0\u1eddng xuy\u00ean.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Khi N\u00e0o N\u00ean B\u1ecf Qua<\/h2>\n\n\n\n<p>\u0110\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV kh\u00f4ng t\u1ef1 \u0111\u1ed9ng \u0111\u00e1ng v\u1edbi s\u1ef1 ph\u1ee9c t\u1ea1p. N\u00f3 kh\u00f4ng ph\u00f9 h\u1ee3p khi c\u00e1c l\u1eddi nh\u1eafc ng\u1eafn, h\u1ea7u h\u1ebft l\u00e0 duy nh\u1ea5t, ho\u1eb7c \u0111\u01b0\u1ee3c x\u1eed l\u00fd theo l\u00f4 v\u1edbi \u00edt c\u1ea5u tr\u00fac l\u1eb7p l\u1ea1i.<\/p>\n\n\n\n<p>T\u00f3m t\u1eaft t\u00e0i li\u1ec7u, t\u1ea1o n\u1ed9i dung s\u00e1ng t\u1ea1o, tr\u00edch xu\u1ea5t m\u1ed9t l\u1ea7n, v\u00e0 nhi\u1ec1u c\u00f4ng vi\u1ec7c theo l\u00f4 kh\u00f4ng \u0111\u1ed3ng b\u1ed9 c\u00f3 th\u1ec3 kh\u00f4ng c\u00f3 \u0111\u1ee7 s\u1ef1 tr\u00f9ng l\u1eb7p ti\u1ec1n t\u1ed1 chia s\u1ebb \u0111\u1ec3 bi\u1ec7n minh cho \u0111\u1ecbnh tuy\u1ebfn nh\u1eadn th\u1ee9c b\u1ed9 nh\u1edb \u0111\u1ec7m. Trong nh\u1eefng tr\u01b0\u1eddng h\u1ee3p \u0111\u00f3, c\u00e2n b\u1eb1ng t\u1ea3i th\u00f4ng th\u01b0\u1eddng c\u00f3 th\u1ec3 g\u1ecdn g\u00e0ng h\u01a1n.<\/p>\n\n\n\n<p>B\u00e0i ki\u1ec3m tra th\u1ef1c t\u1ebf l\u00e0 \u0111o l\u01b0\u1eddng: t\u1ef7 l\u1ec7 truy c\u1eadp b\u1ed9 nh\u1edb \u0111\u1ec7m, th\u1eddi gian \u0111\u1ebfn token \u0111\u1ea7u ti\u00ean, th\u00f4ng l\u01b0\u1ee3ng, \u0111\u1ed9 s\u00e2u h\u00e0ng \u0111\u1ee3i, \u00e1p l\u1ef1c b\u1ed9 nh\u1edb GPU v\u00e0 chi ph\u00ed cho m\u1ed7i nhi\u1ec7m v\u1ee5 ho\u00e0n th\u00e0nh. N\u1ebfu \u0111\u1ecbnh tuy\u1ebfn nh\u1eadn th\u1ee9c b\u1ed9 nh\u1edb \u0111\u1ec7m kh\u00f4ng thay \u0111\u1ed5i c\u00e1c s\u1ed1 li\u1ec7u \u0111\u00f3, h\u00e3y s\u1eeda c\u1ea5u tr\u00fac prompt tr\u01b0\u1edbc.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">C\u00e1ch \u0110i\u1ec1u N\u00e0y Ph\u00f9 H\u1ee3p V\u1edbi ShareAI<\/h2>\n\n\n\n<p>ShareAI l\u00e0 m\u1ed9t th\u1ecb tr\u01b0\u1eddng AI v\u00e0 API, kh\u00f4ng ph\u1ea3i b\u1ed9 c\u00e2n b\u1eb1ng t\u1ea3i ph\u1ee5c v\u1ee5 m\u00f4 h\u00ecnh b\u00ean trong c\u1ee5m GPU c\u1ee7a b\u1ea1n. C\u00e1c nh\u00e0 ph\u00e1t tri\u1ec3n s\u1eed d\u1ee5ng ShareAI \u0111\u1ec3 truy c\u1eadp nhi\u1ec1u m\u00f4 h\u00ecnh th\u00f4ng qua m\u1ed9t API, so s\u00e1nh t\u00edn hi\u1ec7u th\u1ecb tr\u01b0\u1eddng, \u0111\u1ecbnh tuy\u1ebfn y\u00eau c\u1ea7u, qu\u1ea3n l\u00fd s\u1eed d\u1ee5ng v\u00e0 chuy\u1ec3n \u0111\u1ed5i khi m\u1ed9t tuy\u1ebfn b\u1ecb suy gi\u1ea3m.<\/p>\n\n\n\n<p>\u0110i\u1ec1u \u0111\u00f3 v\u1eabn l\u00e0m cho \u0111\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV tr\u1edf n\u00ean li\u00ean quan. N\u1ebfu b\u1ea1n v\u1eadn h\u00e0nh ng\u0103n x\u1ebfp suy lu\u1eadn c\u1ee7a ri\u00eang m\u00ecnh, n\u00f3 gi\u00fap b\u1ea1n \u0111\u1eb7t c\u00e2u h\u1ecfi t\u1ed1t h\u01a1n v\u1ec1 c\u01a1 s\u1edf h\u1ea1 t\u1ea7ng. N\u1ebfu b\u1ea1n s\u1eed d\u1ee5ng c\u00e1c m\u00f4 h\u00ecnh \u0111\u01b0\u1ee3c l\u01b0u tr\u1eef, n\u00f3 gi\u00fap b\u1ea1n \u0111\u00e1nh gi\u00e1 t\u1ea1i sao hai tuy\u1ebfn v\u1edbi t\u00ean m\u00f4 h\u00ecnh t\u01b0\u01a1ng t\u1ef1 c\u00f3 th\u1ec3 ho\u1ea1t \u0111\u1ed9ng kh\u00e1c nhau d\u01b0\u1edbi t\u1ea3i c\u00f4ng vi\u1ec7c th\u1ef1c t\u1ebf.<\/p>\n\n\n\n<p>\u0110\u1ed1i v\u1edbi Ng\u01b0\u1eddi X\u00e2y D\u1ef1ng, \u0111i\u1ec1u n\u00e0y c\u0169ng li\u00ean quan \u0111\u1ebfn gi\u00e1 c\u1ea3. M\u1ed9t \u1ee9ng d\u1ee5ng v\u1edbi c\u00e1c prompt d\u00e0i, ng\u1eef c\u1ea3nh RAG l\u1eb7p l\u1ea1i ho\u1eb7c v\u00f2ng l\u1eb7p t\u00e1c nh\u00e2n c\u00f3 th\u1ec3 t\u1ea1o ra vi\u1ec7c s\u1eed d\u1ee5ng AI r\u1ea5t kh\u00f4ng \u0111\u1ed3ng \u0111\u1ec1u. ShareAI Builder cho ph\u00e9p ch\u1ee7 s\u1edf h\u1eefu \u1ee9ng d\u1ee5ng \u0111\u1ecbnh tuy\u1ebfn l\u01b0u l\u01b0\u1ee3ng suy lu\u1eadn AI th\u00f4ng qua ShareAI, \u0111\u1eb7t m\u1ee9c l\u1ee3i nhu\u1eadn ho\u1eb7c ph\u1ee5 ph\u00ed, \u0111\u1ec3 kh\u00e1ch h\u00e0ng tr\u1ea3 ti\u1ec1n cho vi\u1ec7c s\u1eed d\u1ee5ng \u0111\u1ecbnh tuy\u1ebfn qua ShareAI v\u00e0 nh\u1eadn thanh to\u00e1n h\u00e0ng th\u00e1ng d\u1ef1a tr\u00ean vi\u1ec7c s\u1eed d\u1ee5ng \u0111\u01b0\u1ee3c t\u1ea1o ra. B\u1ea3n th\u00e2n \u1ee9ng d\u1ee5ng v\u1eabn \u0111\u01b0\u1ee3c x\u00e2y d\u1ef1ng b\u00ean ngo\u00e0i ShareAI.<\/p>\n\n\n\n<p>\u0110\u1ed1i v\u1edbi vi\u1ec7c ch\u1ecdn m\u00f4 h\u00ecnh v\u00e0 \u0111\u00e1nh gi\u00e1 tuy\u1ebfn, b\u1eaft \u0111\u1ea7u v\u1edbi <a href=\"https:\/\/shareai.now\/models\/?utm_source=blog&#038;utm_medium=content&#038;utm_campaign=kv-cache-routing-llm-prefill\">ch\u1ee3 m\u00f4 h\u00ecnh ShareAI<\/a>. \u0110\u1ed1i v\u1edbi c\u00e1c nguy\u00ean t\u1eafc c\u01a1 b\u1ea3n v\u1ec1 tri\u1ec3n khai, s\u1eed d\u1ee5ng <a href=\"https:\/\/shareai.now\/docs\/api\/using-the-api\/getting-started-with-shareai-api\/?utm_source=blog&#038;utm_medium=content&#038;utm_campaign=kv-cache-routing-llm-prefill\">T\u00e0i li\u1ec7u tham kh\u1ea3o API ShareAI<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Danh S\u00e1ch Ki\u1ec3m Tra \u0110\u1ecbnh Tuy\u1ebfn B\u1ed9 Nh\u1edb \u0110\u1ec7m KV<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>\u0110\u1eb7t n\u1ed9i dung prompt \u1ed5n \u0111\u1ecbnh tr\u01b0\u1edbc: prompt h\u1ec7 th\u1ed1ng, quy t\u1eafc c\u00f4ng c\u1ee5, v\u00ed d\u1ee5 v\u00e0 ng\u1eef c\u1ea3nh l\u1eb7p l\u1ea1i.<\/li><li>Di chuy\u1ec3n c\u00e1c tr\u01b0\u1eddng \u0111\u1ed9ng sau: d\u1ea5u th\u1eddi gian, ID y\u00eau c\u1ea7u, th\u00f4ng tin c\u1ee5 th\u1ec3 c\u1ee7a ng\u01b0\u1eddi d\u00f9ng v\u00e0 h\u01b0\u1edbng d\u1eabn m\u1ed9t l\u1ea7n.<\/li><li>\u0110o l\u01b0\u1eddng t\u1ef7 l\u1ec7 truy c\u1eadp b\u1ed9 nh\u1edb \u0111\u1ec7m tr\u01b0\u1edbc v\u00e0 sau khi thay \u0111\u1ed5i \u0111\u1ecbnh tuy\u1ebfn.<\/li><li>Theo d\u00f5i th\u1eddi gian \u0111\u1ebfn token \u0111\u1ea7u ti\u00ean, th\u00f4ng l\u01b0\u1ee3ng, \u0111\u1ed9 s\u00e2u h\u00e0ng \u0111\u1ee3i v\u00e0 \u00e1p l\u1ef1c VRAM c\u00f9ng nhau.<\/li><li>B\u1eaft \u0111\u1ea7u v\u1edbi \u0111\u1ecbnh tuy\u1ebfn hash ti\u1ec1n t\u1ed1 tr\u01b0\u1edbc khi x\u00e2y d\u1ef1ng \u0111\u1ecbnh tuy\u1ebfn nh\u1eadn th\u1ee9c s\u1ef1 ki\u1ec7n b\u1ed9 nh\u1edb \u0111\u1ec7m.<\/li><li>Chia c\u00e1c quy t\u1eafc \u0111\u1ecbnh tuy\u1ebfn theo t\u1ea3i c\u00f4ng vi\u1ec7c thay v\u00ec \u00e9p bu\u1ed9c m\u1ed9t ch\u00ednh s\u00e1ch to\u00e0n c\u1ea7u.<\/li><li>Gi\u1eef chi ph\u00ed v\u00e0 \u0111\u1ed9 tr\u1ec5 hi\u1ec3n th\u1ecb \u1edf c\u1ea5p \u0111\u1ed9 \u1ee9ng d\u1ee5ng, kh\u00f4ng ch\u1ec9 b\u00ean trong c\u1ee5m suy lu\u1eadn.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">C\u00e2u h\u1ecfi th\u01b0\u1eddng g\u1eb7p<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">KV cache routing l\u00e0 g\u00ec?<\/h3>\n\n\n<p>KV cache routing l\u00e0 m\u1ed9t chi\u1ebfn l\u01b0\u1ee3c \u0111\u1ecbnh tuy\u1ebfn g\u1eedi c\u00e1c y\u00eau c\u1ea7u v\u1edbi ti\u1ec1n t\u1ed1 g\u1ee3i \u00fd l\u1eb7p l\u1ea1i \u0111\u1ebfn c\u00e1c b\u1ea3n sao c\u00f3 kh\u1ea3 n\u0103ng \u0111\u00e3 gi\u1eef b\u1ed9 nh\u1edb cache KV t\u01b0\u01a1ng \u1ee9ng. M\u1ee5c ti\u00eau l\u00e0 gi\u1ea3m t\u00ednh to\u00e1n ti\u1ec1n \u0111i\u1ec1n d\u01b0 th\u1eeba.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">KV cache routing kh\u00e1c g\u00ec so v\u1edbi prefix caching?<\/h3>\n\n\n<p>Prefix caching l\u00e0 kh\u1ea3 n\u0103ng c\u1ee7a \u0111\u1ed9ng c\u01a1 ph\u1ee5c v\u1ee5 m\u00f4 h\u00ecnh \u0111\u1ec3 t\u00e1i s\u1eed d\u1ee5ng tr\u1ea1ng th\u00e1i \u0111\u00e3 l\u01b0u trong b\u1ed9 nh\u1edb cache cho c\u00e1c ti\u1ec1n t\u1ed1 g\u1ee3i \u00fd \u0111\u01b0\u1ee3c chia s\u1ebb. KV cache routing l\u00e0 chi\u1ebfn l\u01b0\u1ee3c ph\u00e2n b\u1ed5 l\u01b0u l\u01b0\u1ee3ng gi\u00fap c\u00e1c y\u00eau c\u1ea7u ph\u00f9 h\u1ee3p \u0111\u1ebfn n\u01a1i tr\u1ea1ng th\u00e1i \u0111\u00e3 l\u01b0u trong b\u1ed9 nh\u1edb cache \u0111\u00f3 \u0111\u00e3 t\u1ed3n t\u1ea1i.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">T\u1ea1i sao \u0111\u1ecbnh tuy\u1ebfn v\u00f2ng tr\u00f2n l\u1ea1i g\u00e2y h\u1ea1i cho prefix caching?<\/h3>\n\n\n<p>\u0110\u1ecbnh tuy\u1ebfn v\u00f2ng tr\u00f2n ph\u00e2n t\u00e1n c\u00e1c y\u00eau c\u1ea7u tr\u00ean c\u00e1c b\u1ea3n sao m\u00e0 kh\u00f4ng bi\u1ebft b\u1ea3n sao n\u00e0o c\u00f3 ti\u1ec1n t\u1ed1 \u0111\u00e3 l\u01b0u trong b\u1ed9 nh\u1edb cache. M\u1ed9t g\u1ee3i \u00fd l\u1eb7p l\u1ea1i c\u00f3 th\u1ec3 b\u1ecf l\u1ee1 b\u1ed9 nh\u1edb cache ch\u1ec9 v\u00ec n\u00f3 \u0111\u1ebfn m\u1ed9t b\u1ea3n sao kh\u00e1c.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Nh\u1eefng kh\u1ed1i l\u01b0\u1ee3ng c\u00f4ng vi\u1ec7c n\u00e0o h\u01b0\u1edfng l\u1ee3i nhi\u1ec1u nh\u1ea5t t\u1eeb KV cache routing?<\/h3>\n\n\n<p>Chat nhi\u1ec1u l\u01b0\u1ee3t, RAG, c\u00e1c t\u00e1c nh\u00e2n m\u00e3 h\u00f3a, c\u00e1c t\u00e1c nh\u00e2n h\u1ed7 tr\u1ee3, g\u1ee3i \u00fd few-shot, v\u00e0 c\u00e1c \u1ee9ng d\u1ee5ng v\u1edbi c\u00e1c g\u1ee3i \u00fd h\u1ec7 th\u1ed1ng d\u00e0i \u0111\u01b0\u1ee3c chia s\u1ebb l\u00e0 nh\u1eefng \u1ee9ng vi\u00ean m\u1ea1nh nh\u1ea5t v\u00ec ch\u00fang t\u00e1i s\u1eed d\u1ee5ng c\u00e1c ti\u1ec1n t\u1ed1 g\u1ee3i \u00fd \u0111\u00e1ng k\u1ec3.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Khi n\u00e0o m\u1ed9t nh\u00f3m n\u00ean b\u1ecf qua KV cache routing?<\/h3>\n\n\n<p>B\u1ecf qua n\u00f3 khi c\u00e1c g\u1ee3i \u00fd ng\u1eafn, ch\u1ee7 y\u1ebfu l\u00e0 duy nh\u1ea5t, ho\u1eb7c theo l\u00f4 v\u1edbi c\u1ea5u tr\u00fac l\u1eb7p l\u1ea1i \u00edt. Trong nh\u1eefng tr\u01b0\u1eddng h\u1ee3p \u0111\u00f3, s\u1ef1 ph\u1ee9c t\u1ea1p c\u1ee7a \u0111\u1ecbnh tuy\u1ebfn c\u00f3 th\u1ec3 kh\u00f4ng mang l\u1ea1i gi\u00e1 tr\u1ecb \u0111\u00e1ng k\u1ec3.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">vLLM v\u00e0 SGLang c\u00f3 h\u1ed7 tr\u1ee3 prefix caching kh\u00f4ng?<\/h3>\n\n\n<p>C\u00f3. T\u00e0i li\u1ec7u vLLM ghi nh\u1eadn vi\u1ec7c t\u1ef1 \u0111\u1ed9ng prefix caching, v\u00e0 t\u00e0i li\u1ec7u SGLang ghi nh\u1eadn prefix caching cho b\u1ed9 nh\u1edb cache KV \u0111\u01b0\u1ee3c chia s\u1ebb tr\u00ean c\u00e1c chu\u1ed7i token ph\u1ed5 bi\u1ebfn. \u0110\u1ed9ng c\u01a1 ph\u1ee5c v\u1ee5 v\u1eabn c\u1ea7n h\u1ed7 tr\u1ee3 \u0111\u1ecbnh tuy\u1ebfn khi c\u00f3 nhi\u1ec1u b\u1ea3n sao li\u00ean quan.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">KV cache routing c\u00f3 gi\u1ed1ng v\u1edbi semantic caching kh\u00f4ng?<\/h3>\n\n\n<p>Kh\u00f4ng. KV cache routing ho\u1ea1t \u0111\u1ed9ng v\u1edbi vi\u1ec7c t\u00e1i s\u1eed d\u1ee5ng ti\u1ec1n t\u1ed1 ch\u00ednh x\u00e1c ho\u1eb7c g\u1ea7n c\u1ea5u tr\u00fac b\u00ean trong ph\u1ee5c v\u1ee5 suy lu\u1eadn. Semantic caching l\u01b0u tr\u1eef v\u00e0 t\u00e1i s\u1eed d\u1ee5ng c\u00e1c ph\u1ea3n h\u1ed3i ho\u1eb7c k\u1ebft qu\u1ea3 trung gian d\u1ef1a tr\u00ean \u00fd ngh\u0129a, th\u01b0\u1eddng v\u1edbi c\u00e1c embeddings ho\u1eb7c ng\u01b0\u1ee1ng t\u01b0\u01a1ng t\u1ef1.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ShareAI c\u00f3 thay th\u1ebf b\u1ed9 c\u00e2n b\u1eb1ng t\u1ea3i nh\u1eadn th\u1ee9c KV-cache kh\u00f4ng?<\/h3>\n\n\n<p>Kh\u00f4ng. ShareAI l\u00e0 th\u1ecb tr\u01b0\u1eddng AI v\u00e0 l\u1edbp API cho truy c\u1eadp m\u00f4 h\u00ecnh, \u0111\u1ecbnh tuy\u1ebfn, chuy\u1ec3n \u0111\u1ed5i d\u1ef1 ph\u00f2ng, s\u1eed d\u1ee5ng v\u00e0 thanh to\u00e1n. \u0110\u1ecbnh tuy\u1ebfn nh\u1eadn th\u1ee9c KV-cache l\u00e0 c\u01a1 s\u1edf h\u1ea1 t\u1ea7ng ph\u1ee5c v\u1ee5 m\u00f4 h\u00ecnh c\u1ea5p th\u1ea5p d\u00e0nh cho c\u00e1c nh\u00f3m v\u1eadn h\u00e0nh c\u00e1c b\u1ea3n sao suy lu\u1eadn.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">C\u00e1c nh\u00e0 x\u00e2y d\u1ef1ng n\u00ean suy ngh\u0129 nh\u01b0 th\u1ebf n\u00e0o v\u1ec1 \u0111\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV?<\/h3>\n\n\n<p>C\u00e1c nh\u00e0 x\u00e2y d\u1ef1ng n\u00ean coi h\u00e0nh vi b\u1ed9 nh\u1edb \u0111\u1ec7m l\u00e0 m\u1ed9t y\u1ebfu t\u1ed1 chi ph\u00ed trong c\u00e1c \u1ee9ng d\u1ee5ng n\u1eb7ng AI. N\u1ebfu \u1ee9ng d\u1ee5ng c\u1ee7a h\u1ecd c\u00f3 m\u1ee9c s\u1eed d\u1ee5ng kh\u00f4ng \u0111\u1ed3ng \u0111\u1ec1u, ShareAI c\u00f3 th\u1ec3 gi\u00fap \u0111\u1ecbnh tuy\u1ebfn v\u00e0 ki\u1ebfm ti\u1ec1n t\u1eeb l\u01b0u l\u01b0\u1ee3ng AI \u0111\u00f3 trong khi \u1ee9ng d\u1ee5ng v\u1eabn \u0111\u01b0\u1ee3c x\u00e2y d\u1ef1ng v\u00e0 s\u1edf h\u1eefu b\u00ean ngo\u00e0i ShareAI.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">C\u00e1c nh\u00f3m n\u00ean \u0111o l\u01b0\u1eddng g\u00ec tr\u01b0\u1edbc khi thay \u0111\u1ed5i \u0111\u1ecbnh tuy\u1ebfn?<\/h3>\n\n\n<p>\u0110o l\u01b0\u1eddng t\u1ef7 l\u1ec7 truy c\u1eadp b\u1ed9 nh\u1edb \u0111\u1ec7m, th\u1eddi gian \u0111\u1ebfn token \u0111\u1ea7u ti\u00ean, th\u00f4ng l\u01b0\u1ee3ng, \u0111\u1ed9 s\u00e2u h\u00e0ng \u0111\u1ee3i, \u00e1p l\u1ef1c VRAM, chi ph\u00ed m\u1ed7i nhi\u1ec7m v\u1ee5 v\u00e0 ch\u1ea5t l\u01b0\u1ee3ng \u0111\u1ea7u ra. Thay \u0111\u1ed5i \u0111\u1ecbnh tuy\u1ebfn n\u00ean c\u1ea3i thi\u1ec7n kh\u1ed1i l\u01b0\u1ee3ng c\u00f4ng vi\u1ec7c, kh\u00f4ng ch\u1ec9 b\u1ea3ng \u0111i\u1ec1u khi\u1ec3n.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u0110\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV c\u00f3 th\u1ec3 gi\u1ea3m chi ph\u00ed API AI kh\u00f4ng?<\/h3>\n\n\n<p>N\u00f3 c\u00f3 th\u1ec3 gi\u1ea3m chi ph\u00ed c\u01a1 s\u1edf h\u1ea1 t\u1ea7ng cho c\u00e1c nh\u00f3m t\u1ef1 ph\u1ee5c v\u1ee5 m\u00f4 h\u00ecnh v\u00ec c\u00f4ng vi\u1ec7c \u0111i\u1ec1n tr\u01b0\u1edbc \u00edt d\u01b0 th\u1eeba h\u01a1n c\u00f3 th\u1ec3 c\u1ea3i thi\u1ec7n hi\u1ec7u qu\u1ea3 GPU. \u0110\u1ed1i v\u1edbi c\u00e1c API \u0111\u01b0\u1ee3c l\u01b0u tr\u1eef, hi\u1ec7u qu\u1ea3 ph\u1ee5 thu\u1ed9c v\u00e0o vi\u1ec7c nh\u00e0 cung c\u1ea5p c\u00f3 ti\u1ebft l\u1ed9 nh\u1eefng kho\u1ea3n ti\u1ebft ki\u1ec7m \u0111\u00f3 trong gi\u00e1 c\u1ea3 ho\u1eb7c hi\u1ec7u su\u1ea5t hay kh\u00f4ng.<\/p>","protected":false},"excerpt":{"rendered":"<p>\u0110\u1ecbnh tuy\u1ebfn b\u1ed9 nh\u1edb \u0111\u1ec7m KV g\u1eedi c\u00e1c ti\u1ec1n t\u1ed1 nh\u1eafc l\u1eb7p l\u1ea1i \u0111\u1ebfn c\u00e1c b\u1ea3n sao c\u00f3 th\u1ec3 t\u00e1i s\u1eed d\u1ee5ng tr\u1ea1ng th\u00e1i ch\u00fa \u00fd \u0111\u01b0\u1ee3c l\u01b0u trong b\u1ed9 nh\u1edb \u0111\u1ec7m, gi\u00fap c\u00e1c nh\u00f3m gi\u1ea3m c\u00f4ng vi\u1ec7c \u0111i\u1ec1n tr\u01b0\u1edbc LLM d\u01b0 th\u1eeba.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"cta-title":"Explore AI Models","cta-description":"Compare price, latency, and availability across providers.","cta-button-text":"Browse Models","cta-button-link":"https:\/\/shareai.now\/models\/?utm_source=blog&utm_medium=content&utm_campaign=kv-cache-routing-llm-prefill","rank_math_title":"KV Cache Routing: Cut Redundant LLM Prefill Work","rank_math_description":"KV cache routing sends repeated prompt prefixes to the right replica so LLM teams can reduce redundant prefill work and latency.","rank_math_focus_keyword":"KV cache routing, prefix-aware routing, prefix caching, LLM inference optimization","footnotes":""},"categories":[4,6],"tags":[176,173,175,174,178,177],"class_list":["post-3047","post","type-post","status-publish","format-standard","hentry","category-developers","category-insights","tag-ai-routing","tag-kv-cache-routing","tag-llm-inference","tag-prefix-caching","tag-sglang","tag-vllm"],"_links":{"self":[{"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/posts\/3047","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/comments?post=3047"}],"version-history":[{"count":1,"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/posts\/3047\/revisions"}],"predecessor-version":[{"id":3089,"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/posts\/3047\/revisions\/3089"}],"wp:attachment":[{"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/media?parent=3047"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/categories?post=3047"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/shareai.now\/vi\/api\/wp\/v2\/tags?post=3047"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}