LiveBench: An Overview

2024/08/08

$\boxed{\text{\large L\normalsize ive\large B\normalsize ench}}$ is perhaps the least bad LLM benchmark. It’s not too easy, has a relatively diverse set of tasks, but also avoids the pitfall of models overfitting to the test set by updating once each month. It has an expansive collection of models and regularly adds new ones. It is also the benchmark that most reliably aligns with my subjective evaluation of each model.

This page is designed to make the contents of LiveBench more easily accessible. Each section contains some basic information about each task included in LiveBench, an example prompt for each one (and its answer), and a simple plot of how various models perform on that type of task. For each task, the model’s score for that task represents the percentage of questions in that task completed correctly.

Some of the examples are truncated or modified slightly to improve the formatting and readability of this article. For the exact contents of each prompt, please see the LiveBench datasets on Hugging Face.

I will update the plots on this page periodically and add new tasks when they are added to LiveBench.

LiveBench has 6 categories and 18 tasks, with a total of 1000 questions. Here’s a summary:

Task Name	Task Count	Description	Category
$\texttt{web\textunderscore of\textunderscore lies\textunderscore v2}$	50	Tedious logic puzzles about people who either lie or tell the truth	Reasoning
$\texttt{zebra\textunderscore puzzle}$	50	Logic puzzles with abstract and open-ended directional relations	Reasoning
$\texttt{spatial}$	50	Spatial reasoning word problems about cutting 2D and 3D shapes	Reasoning
$\texttt{LCB\textunderscore generation}$	78	40 LeetCode problems and 38 AtCoder problems	Coding
$\texttt{coding\textunderscore completion}$	50	Fill-in-the-blank LeetCode problems	Coding
$\texttt{math\textunderscore comp}$	96	Problems from regional high school math competitions	Mathematics
$\texttt{olympiad}$	36	Problems from national and international math olympiads (USAMO, IMO)	Mathematics
$\texttt{AMPS\textunderscore Hard}$	100	Procedurally-generated math problems in $\LaTeX$ : derivatives, integrals, completing the square, factoring	Mathematics
$\texttt{cta}$	50	Problems of picking the best name for a column from a list (based on its values)	Data Analysis
$\texttt{tablejoin}$	50	Problems of creating a mapping of columns with similar data (based on their values)	Data Analysis
$\texttt{tablereformat}$	50	Problems of converting an HTML table to JSON	Data Analysis
$\texttt{connections}$	50	NYT Connections problems (with varying difficulty)	Language
$\texttt{plot\textunderscore unscrambling}$	40	Problems of putting sentences in an IMDB movie description in the correct order	Language
$\texttt{typos}$	50	Problems of fixing typos and spelling mistakes in text	Language
$\texttt{summarize}$	50	Tasks of summarizing the beginning of a Guardian article, with precise criteria	Instruction Following
$\texttt{paraphrase}$	50	Tasks of paraphrasing the beginning of a Guardian article, with precise criteria	Instruction Following
$\texttt{simplify}$	50	Tasks of simplifying the beginning of a Guardian article, with precise criteria	Instruction Following
$\texttt{story\textunderscore generation}$	50	Tasks of writing a story about the beginning of a Guardian article, with precise criteria	Instruction Following

Now, let’s look at each task.

$\texttt{average}$ (unofficial)

This is simply an average of the model’s scores across all tasks. All tasks contribute equally to the average.

$\texttt{AMPS\textunderscore Hard}$

100 procedurally-generated $\LaTeX$ math problems of several types such as taking the derivative or integral of a function, completing the square, or factoring a polynomial. Inspired by the MATH and AMPS datasets.

Example:

Differentiate the following function: $-2 x+\tan \left(\frac{9}{2}-\frac{17 x}{2}\right)+\frac{3}{2}$ . Please put your final answer in a $\texttt{\textbackslash boxed\{\}}$ .

Answer:

$-\frac{17}{2} \sec ^2\left(\frac{1}{2} (9-17 x)\right)-2$

$\texttt{coding\textunderscore completion}$

50 relatively recent LeetCode completion problems of Easy, Medium, and Hard difficulty, from LiveCodeBench.

“Completion” meaning the model is given a part of the solution and must write ONLY the part that follows to complete the solution. Sometimes, the task includes just the class declaration, function declaration, sometimes there is a (partial) docstring, and sometimes it includes parts of the actual solution.

LiveBench concatenates the provided partial solution with the model’s answer, runs that code, and checks if it passes all the test cases.

Example (LeetCode 2962, Medium difficulty):

Instructions: You are an expert Python programmer. You will be given a question (problem specification) and the first lines of Python solution to this problem, and will write in Python the remaining lines of the program to produce a correct Python program that matches the specification and passes all tests. You will NOT return anything except for the second part of the program that you wrote.

Question: You are given an integer array nums and a positive integer k. Return the number of subarrays where the maximum element of nums appears at least k times in that subarray. A subarray is a contiguous sequence of elements within an array.

Example 1:
nums = [1,3,2,3,3], k = 2
Output: 6.

Explanation: the subarrays that contain the element 3 at least 2 times are:
[1,3,2,3], [1,3,2,3,3], [3,2,3], [3,2,3,3], [2,3,3], [3,3]
Example 2:
nums = [1,4,2,1], k = 3
Output: 0.

Explanation: No subarray contains the element 4 at least 3 times.

Constraints:

$1 \leq \texttt{nums.length} \leq 10^5$

$1 \leq \texttt{nums[i]} \leq 10^6$

$1 \leq k \leq 10^5$

Format: You will use the following starter code to write the solution to the problem and enclose your code within delimiters.
class Solution(object):
    def countSubarrays(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: int
        """
        mx = max(nums)
        result = left = cnt = 0
        for right in range(len(nums)):
            cnt += int(nums[right] == mx)
            while cnt == k:
                cnt -= int(nums[left] == mx)

Answer:

                    curr += x
                    result = max(result, curr-prefix[x-k], curr-prefix[x+k])
                return result if result != float("-inf") else 0

The answer above is just one of many correct responses.

$\texttt{connections}$

50 recent Connections problems from The New York Times. There are 3 difficulty levels:

15 of the tasks provide a list of 8 words to group.
15 of the tasks provide a list of 12 words to group.
20 of the tasks provide a list of 16 words to group.

In the model’s answer, the order of the groups and the order of the words within each group do not matter, as long as the groups themselves are correct the model will get full points.

Example:

You are given 8 words/phrases below. Find two groups of four items that share something in common.

Here are a few examples of groups:

bass, flounder, salmon, trout (all four are fish)

ant, drill, island, opal (all four are two-word phrases that start with ‘fire’)

are, why, bee, queue (all four are homophones of letters)

sea, sister, sin, wonder (all four are members of a septet).

Categories will be more specific than e.g., ‘5-letter-words’, ‘names’, or ‘verbs’.

There is exactly one solution. Think step-by-step, and then give your answer in bold as a list of the 8 items separated by commas, ordered by group (for example, bass, founder, salmon, trout, ant, drill, island, opal). If you don’t know the answer, make your best guess.

The items are: use, leverage, through, up, exploit, done, over, milk.

Answer:

exploit, leverage, milk, use, done, over, through, up

$\texttt{cta}$

As I understand it, this task is just selecting the best name for a column, given a list of column names to choose from. 50 problems. No chain-of-thought allowed.

Example:

Pick the column’s class based on the provided column sample. Choose exactly one of the listed classes. Please respond only with the name of the class.

Column sample:
[[1995], [1964], [1986], [2022], [1985]]
Classes:
['Maize yield' 'code country' 'Year' 'country']

Answer:

$\texttt{Year}$

$\texttt{LCB\textunderscore generation}$

78 coding problems: 38 from AtCoder, 40 from LeetCode.

Similar to $\texttt{coding\textunderscore completion}$ except the model is tasked with writing the full solution instead of only part of the solution.

Example (ABC340_A):

Instructions: You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests. You will NOT return anything except for the program.

Question:

Print an arithmetic sequence with first term A, last term B, and common difference D.

You are only given inputs for which such an arithmetic sequence exists.

The input is given from Standard Input in the following format:
A B D
Print the terms of the arithmetic sequence with first term A, last term B, and common difference D, in order, separated by spaces.

Constraints:

$1 \leq A \leq B \leq 100$

$1 \leq D \leq 100$

There is an arithmetic sequence with first term A, last term B, and common difference D.

All input values are integers.

Sample Input/Output 1:

$\texttt{3 9 2 → 3 5 7 9}$

The arithmetic sequence with first term 3, last term 9, and common difference 2 is $\texttt{(3,5,7,9)}$ .

Sample Input/Output 2:

$\texttt{10 10 1 → 10}$

The arithmetic sequence with first term 10, last term 10, and common difference 1 is $\texttt{(10)}$ .

Answer:

No ground-truth solution was provided. LiveBench runs the code and checks if it passes all the test cases.

$\texttt{math\textunderscore comp}$

96 challenging high school math competition problems from the AMC12 2023 (contributed 50), SMC 2023 (contributed 17), and the AIME 2024 (contributed 29).

It seems like the LiveBench authors threw in some strangely specific requirements for the answer formatting on a lot of these. I guess just to make the questions harder?

Example:

Real numbers $x$ and $y$ with $x,y>1$ satisfy $\log_x(y^x)=\log_y(x^{4y})=10.$ What is the value of $xy$ ? Please think step by step, and then display the answer at the very end of your response. The answer is an integer consisting of exactly 3 digits (including leading zeros), ranging from 000 to 999, inclusive. For example, the answer might be 068 or 972. If you cannot determine the correct answer, take your best guess. Remember to have the three digits as the last part of the response.

Answer:

$\texttt{025}$

$\texttt{olympiad}$

This set of tasks includes 36 questions from the International Math Olympiad (contributed 12) and the United States of America Mathematical Olympiad (contributed 24).

These questions are kind of a combination between matching, multiple choice, and fill in the blank. You are given a problem and a partially complete solution, you just have to fill in a few blank expressions by picking from a list of expressions at the bottom in the correct order.

While writing this I found a weird prompt for one of the problems. There was only one expression in the list, and only one expression slot you had to fill in, so it was obvious that the answer was 1 (you didn’t even have to do any math).

I later found out this was done intentionally, and is actually mentioned in the Appendix of the LiveBench paper:

We generate 3 hardness variants for each problem, masking out 10%, 50% and 80% of the equations in the proof. We evaluate by computing the edit distance between the ground truth ranking order and the model predicted ranking order. [NB : in preliminary testing we also evaluated using the accuracy metric and the model rankings remained nearly the same]. Models perform worse on IMO compared to USAMO, in line with expectations. We also looked at the performance as separated by question hardness. The scores are greatly affected by question hardness going from as high as 96.8 for the easiest questions (10% masked out, GPT-4o) to as low as 36 for the hardest (80% masked out). The full results are in Table 6 and Table 7.

If only I read past the References…

Example (from USAMO):

You are given a question and its solution. The solution however has its formulae masked out using the tag where X indicates the identifier for the missing tag. You are also given a list of formulae in latex in the format <expression Y> = <LaTeX code> where Y is the identifier for the formula. Your task is to match the formulae to the missing tags in the solution. Think step by step out loud as to what the answer should be. If you are not sure, give your best guess. Your answer should be in the form of a list of numbers, e.g., 5, 22, 3, …, corresponding to the expression identifiers that fill the missing parts. For example, if your answer starts as 5, 22, 3, …, then that means expression 5 fills <missing 1>, expression 22 fills <missing 2>, and expression 3 fills <missing 3>.

The question is:

In an acute triangle $ABC$ , let $M$ be the midpoint of $\overline{BC}$ . Let $P$ be the foot of the perpendicular from $C$ to $AM$ . Suppose that the circumcircle of triangle $ABP$ intersects line $BC$ at two distinct points $B$ and $Q$ . Let $N$ be the midpoint of $\overline{AQ}$ . Prove that $NB=NC$ .

The solution is:

Let $X$ be the foot from $A$ to $\overline{BC}$ . By definition, <missing 3> . Thus, <missing 4> , and $\triangle BMP \sim \triangle AMQ$ .

From this, we have <missing 5> , as $MC=MB$ . Thus, $M$ is also the midpoint of $XQ$ .

Now, <missing 6> if $N$ lies on the perpendicular bisector of $\overline{BC}$ . As $N$ lies on the perpendicular bisector of $\overline{XQ}$ , which is also the perpendicular bisector of <missing 7> (as $M$ is also the midpoint of $XQ$ ), we are done.

The formulae are:

<expression 1>: $\triangle BMP \sim \triangle AMQ$

<expression 2>: $\triangle AXM \sim \triangle MPC$

<expression 3>: $\overline{BC}$

<expression 4>: $\angle AXM = \angle MPC = 90^{\circ}$

<expression 5>: $\frac{MP}{MX} = \frac{MC}{MA} = \frac{MP}{MQ} = \frac{MA}{MB}$

<expression 6>: $NB = NC$

<expression 7>: $\triangle AXM \sim \triangle MPC$

Answer:

$7, 1, 4, 2, 5, 6, 3$

$\texttt{paraphrase}$

50 paraphrase tasks, all of which are based on articles from The Guardian. Each prompt has wacky requirements to try to throw the model off, since this is an instruction following benchmark.

Example:

The following are the beginning sentences of a news article from the Guardian.

OK, so a mysterious, cigar-shaped, 400m-long object is speeding through the solar system and astronomers are checking it for evidence of alien technology. So what do we do if it turns out that Oumuamua, as they have named it, is broadcasting extraterrestrial radio signals? John Chambers, Leeds Post your answers – and new questions – below or email them to nq@theguardian.com

Please paraphrase based on the sentences provided. Answer with less than 274 words. Your response must have 1 sections. Mark the beginning of each section with Section X, such as:

Section 1

[content of section 1]

Section 2

[content of section 2]

At the end of your response, please explicitly add a postscript starting with P.S.

Answer:

No ground-truth answer is provided. LiveBench runs checks on the output to verify that it meets the stated criteria in the prompt.

$\texttt{plot\textunderscore unscrambling}$

40 headache-inducing puzzles about unscrambling a set of sentences that describe the plot of a movie.

Example:

The following plot summary of a movie has had the sentences randomly reordered. Rewrite the plot summary with the sentences correctly ordered. Begin the plot summary with <PLOT_SUMMARY>.

The sentences are:

While they both live alone, the two are friendly with one another – Clay sees Eloise as the only person who ever took care of him. He rents some space in a barn owned by retired teacher Eloise Parker, a widow who owns and lives in the house on the property. Derek assigns the company’s head of security, former CIA director Wallace Westwyld, to find a way to stop Adam. Verona learns that Eloise was scammed out of every penny she had. Devastated by the realization that she got scammed out of so much, Eloise shoots herself in the head. Upon learning the hard way that Adam is after them, Garnett explains the situation to the crew’s ringleader, 28-year-old tech executive Derek Danforth, who runs a Boston-based corporation called Danforth Enterprises. In Hampden, Massachusetts, Adam Clay is a beekeeper who has several hives of bees. But Wallace learns that Adam is a retired member of a classified program called the Beekeepers, whose members are tasked with fighting different forms of corruption, operating above and beyond governmental jurisdiction. The call center’s manager, Mickey Garnett, cons Eloise out of everything – including more than $2,000,000 that’s in the account of Safe Homes Foundation, a children’s charity whose account she manages. The Beekeepers are so efficient and well-trained that they make the military look like a joke. Adam is quickly cleared when Eloise’s death is ruled a suicide because there was no gunshot residue on Adam, and Eloise’s fingerprints were the only prints on the gun. Verona apologizes to Adam for her accusation, and she tells him that the FBI cyber-crimes office has told her that the scammer crew that victimized Eloise has been operating for two years, but the FBI hasn’t been able to identify any of the scammers. Adam finds Eloise’s body in her house, and he’s immediately arrested by Eloise’s daughter, Boston-based FBI Special Agent Verona Parker, who hastily accuses Adam of shooting Eloise. She calls the number on the screen, and connects to a call center that’s located in Springfield, Massachusetts, not aware that it’s a phishing scam. Verona vows to find the scammers, but Adam, enraged by what happened, decides to hunt down the scammers himself, and make them pay for what they did to Eloise. Wallace realizes that Adam is a man who should be feared by people like Derek. Adam proves to be an unstoppable force, and Derek, Garnett, and the rest of the scammers have no idea what kind of scorched-Earth hell Adam is about to unleash on them. While checking some things on her laptop computer, Eloise sees a warning about two viruses in the system.

Answer:

In Hampden, Massachusetts, Adam Clay is a beekeeper who has several hives of bees. He rents some space in a barn owned by retired teacher Eloise Parker, a widow who owns and lives in the house on the property. While they both live alone, the two are friendly with one another – Clay sees Eloise as the only person who ever took care of him. While checking some things on her laptop computer, Eloise sees a warning about two viruses in the system. She calls the number on the screen, and connects to a call center that’s located in Springfield, Massachusetts, not aware that it’s a phishing scam. The call center’s manager, Mickey Garnett, cons Eloise out of everything – including more than $2,000,000 that’s in the account of Safe Homes Foundation, a children’s charity whose account she manages. Devastated by the realization that she got scammed out of so much, Eloise shoots herself in the head. Adam finds Eloise’s body in her house, and he’s immediately arrested by Eloise’s daughter, Boston-based FBI Special Agent Verona Parker, who hastily accuses Adam of shooting Eloise. Verona learns that Eloise was scammed out of every penny she had. Adam is quickly cleared when Eloise’s death is ruled a suicide because there was no gunshot residue on Adam, and Eloise’s fingerprints were the only prints on the gun. Verona apologizes to Adam for her accusation, and she tells him that the FBI cyber-crimes office has told her that the scammer crew that victimized Eloise has been operating for two years, but the FBI hasn’t been able to identify any of the scammers. Verona vows to find the scammers, but Adam, enraged by what happened, decides to hunt down the scammers himself, and make them pay for what they did to Eloise. Upon learning the hard way that Adam is after them, Garnett explains the situation to the crew’s ringleader, 28-year-old tech executive Derek Danforth, who runs a Boston-based corporation called Danforth Enterprises. Derek assigns the company’s head of security, former CIA director Wallace Westwyld, to find a way to stop Adam. But Wallace learns that Adam is a retired member of a classified program called the Beekeepers, whose members are tasked with fighting different forms of corruption, operating above and beyond governmental jurisdiction. The Beekeepers are so efficient and well-trained that they make the military look like a joke. Wallace realizes that Adam is a man who should be feared by people like Derek. Adam proves to be an unstoppable force, and Derek, Garnett, and the rest of the scammers have no idea what kind of scorched-Earth hell Adam is about to unleash on them.

I honestly don’t even know how you’re supposed to do this one.

$\texttt{simplify}$

50 text simplification tasks, all of which are based on articles from The Guardian. Each prompt has wacky requirements to try to throw the model off, since this is an instruction following benchmark.

Example:

The following are the beginning sentences of a news article from the Guardian.

Amsterdam has won the right to become the new host for the European Medicines Agency (EMA). In a nail-biting final round last night, the 19 European cities that had put in bids had been whittled down to Milan and Amsterdam, sharing an equal number of votes. A draw from a hat sealed it for Amsterdam. Moments later, the same scenario played out for the European Banking Authority (EBA), with Paris and Dublin going into a hat and Paris being drawn. And so it is settled. The EMA will move from London to Amsterdam after Brexit – taking with it nearly 900 jobs, a budget of €322m, and some 40,000 business visits every year, which support local hotels, restaurants, taxis and so on. Also likely to move with the EMA is the attendant industry that congregates around it for easy access to the regulator. It’s a substantial loss of finances, talent, infrastructure and influence. As the EMA leaves the UK, the question now becomes: does the UK leave the EMA? The EMA is the regulatory body for the single market for medicines, and the two are entwined.

Please explain in simpler terms what this text means. Include keywords [‘branch’, ‘currency’, ‘object’, ‘request’, ‘yesterday’] in the response. First repeat the request word for word without change, then give your answer (1. do not say any words or characters before repeating the request; 2. the request you need to repeat does not include this sentence)

Answer:

No ground-truth answer is provided. LiveBench runs checks on the output to verify that it meets the stated criteria in the prompt.

$\texttt{spatial}$

50 word problems about making cuts through various shapes (in two and three dimensions) and determining the number / shapes of the remaining pieces.

Example:

Suppose I have a physical, solid square with vertices ABCD and a physical, solid equilateral triangle with vertices EFG. I place both shapes on a plane and arrange them so that they are not overlapping at all, but F is touching A, and G is touching B. Then I make two cuts through ED and through DG. Then I separate all the pieces (e.g. so F is no longer touching A, and so on). How many pieces are there? Think step by step, and then put your answer in bold as a single integer (for example, 0). If you don’t know, guess.

Answer:

$5$

$\texttt{story\textunderscore generation}$

50 story generation tasks, all of which are based on articles from The Guardian. Each prompt has wacky requirements to try to throw the model off, since this is an instruction following benchmark.

Example:

The following are the beginning sentences of a news article from the Guardian.

This is a bankrupt budget. Not in the strictly financial sense, though how much more threadbare core public services can become without collapsing and causing social mayhem the next few years will prove, if the government lasts. Even with faltering economic growth, public spending is to go on falling as a proportion of GDP. It’s bankrupt in ideas, in understanding, in preparedness to examine what has been happening to public services. Housing offers a glaring example. For all the bells and whistles in the budget, and some welcome augmentation of council powers, the government fails to make an obvious connexion. Building houses, allocating land, encouraging development, and policing the delinquency of private developers all imply an active and financially lubricated local government. Housing is and always will be about places, streets, brownfields – and public acceptance of schemes that will abut on their property or where they walk their dog. That’s what councillors do. Ace ideologue of the free market Oliver Letwin, of all people, can’t substitute.

Please generate a story based on the sentences provided. Wrap your entire response with double quotation marks. Your answer must contain exactly 4 bullet points. Use the markdown bullet points such as:

This is point 1.

This is point 2

Finish your response with this exact phrase: Any other questions?

No other words should follow this phrase. Your response must have 3 sections. Mark the beginning of each section with SECTION X, such as:

SECTION 1

[content of section 1]

SECTION 2

[content of section 2]

Answer:

No ground-truth answer is provided. LiveBench runs checks on the output to verify that it meets the stated criteria in the prompt.

$\texttt{summarize}$

50 summarization tasks, all of which are based on articles from The Guardian. Each prompt has wacky requirements to try to throw the model off, since this is an instruction following benchmark.

Example:

The following are the beginning sentences of a news article from the Guardian.

In July this year, everyone said that the World Cup final felt like a turning point. You don’t get 27,000 people to a women’s cricket match and not think that something extraordinary is going on. But the truth of turning points is that you can’t in the moment judge whether they’re real or perceived. It has taken the Women’s Ashes in Australia this past month to show the extent of the turn. Australia originally lagged behind England in embracing the game. The 2015 Ashes was played at intimate cricket grounds, selling out some matches with crowds in excess of 5000. The 2013-14 version in Australia was nowhere near that. Attendances at the Perth Test were in the low hundreds, while the Twenty20s were sparsely attended curtain-raisers for a meaningless men’s series. Olympiads stack up like sedimentary layers, and the difference from four years ago to now is extraordinary. The day-night Test match drew over 12,600 across its duration, while the three T20 matches drew a bit over or a bit under 4000 spectators apiece.

Please summarize based on the sentences provided. Give two different responses. Responses and only responses should be separated by 6 asterisk symbols: $\texttt{******}$ .

Answer:

No ground-truth answer is provided. LiveBench runs checks on the output to verify that it meets the stated criteria in the prompt.

$\texttt{tablejoin}$

50 problems about determining the best column mapping between two CSV tables.

Table A has meaningful column names and data. Table B has meaningul data in at least some columns, but its column names are garbled/meaningless. The model has to decide an appropriate mapping from columns in A (not necessarily all of them) to columns in B, based on which columns have similar data.

Example:

Please create a valid join mapping between CSV Table A and CSV Table B. Each column in A maps to 0 or 1 columns in B. Return your response as a Python dictionary, formatted as {col_nae_in_df_a : col_name_in_df_b}. Please return only the dictionary.

CSV Table A:

zipcode,year,life_expectancy
94531,2013,79.02
94539,2013,85.45
94533,2013,79.4
94518,2000,79.18
95132,2013,82.45
95430,2000,79.81
94924,2000,79.37
94549,2000,80.92
95461,2000,81.04
94577,2013,81.02
94305,2000,81.45
94535,2013,79.4
94930,2013,85.98
94619,2000,78.3
94063,2000,78.4
95070,2000,81.04
95401,2013,79.95
94074,2000,80.36
94609,2013,78.0

CSV Table B:

j0ihiCMCXaU,gG+PnzOD1mw,DOgXTTuHGbo
0,94583,2000
0,94506,2013
0,95446,2000
0,94567,2013
0,95120,2000
0,94306,2000
0,95687,2000
0,94040,2013
0,94567,2000
0,95688,2013
0,94938,2013
0,95037,2000
0,94702,2013
0,95121,2000
0,95037,2013
0,94607,2013
0,94929,2000
0,94705,2013
0,94608,2000
0,94109,2013

Answer:

{"year": "DOgXTTuHGbo", "zipcode": "gG+PnzOD1mw"}

$\texttt{tablereformat}$

50 problems of converting an HTML table to JSON.

Example:

Please convert the Input Table from HTML format to JSON format. Please respond only with the table.

Input Table:

<table border="1" class="dataframe">
 <thead>
   <tr style="text-align: right;">
     <th>Country</th>
     <th>Inequality HDI</th>
   </tr>
 </thead>
 <tbody>
   <tr>
     <td>Indonesia</td>
     <td>2</td>
   </tr>
   <tr>
     <td>Azerbaijan</td>
     <td>1</td>
   </tr>
   <tr>
     <td>Denmark</td>
     <td>0</td>
   </tr>
   <tr>
     <td>North Macedonia</td>
     <td>2</td>
   </tr>
   <tr>
     <td>Canada</td>
     <td>0</td>
   </tr>
   <tr>
     <td>Palau</td>
     <td>2</td>
   </tr>
   <tr>
     <td>Papua New Guinea</td>
     <td>3</td>
   </tr>
   <tr>
     <td>Samoa</td>
     <td>2</td>
   </tr>
   <tr>
     <td>Marshall Islands</td>
     <td>2</td>
   </tr>
   <tr>
     <td>Lebanon</td>
     <td>2</td>
   </tr>
 </tbody>
</table>

Answer (prettified JSON):

{
  "111": {
    "Country": "Indonesia",
    "Inequality HDI": 2
  },
  "88": {
    "Country": "Azerbaijan",
    "Inequality HDI": 1
  },
  "4": {
    "Country": "Denmark",
    "Inequality HDI": 0
  },
  "83": {
    "Country": "North Macedonia",
    "Inequality HDI": 2
  },
  "17": {
    "Country": "Canada",
    "Inequality HDI": 0
  },
  "70": {
    "Country": "Palau",
    "Inequality HDI": 2
  },
  "153": {
    "Country": "Papua New Guinea",
    "Inequality HDI": 3
  },
  "115": {
    "Country": "Samoa",
    "Inequality HDI": 2
  },
  "101": {
    "Country": "Marshall Islands",
    "Inequality HDI": 2
  },
  "108": {
    "Country": "Lebanon",
    "Inequality HDI": 2
  }
}

$\texttt{typos}$

50 tasks of determining the intended version of given text, that is, removing all spelling errors and typos.

Example:

Please output this exact text, with no changes at all except for fixing the misspellings. Please leave all other stylistic decisions like commas and US vs British spellings as in the original text.

This paper is a complement of the modularity result of Bruinier, Howard, Kudla, Rapoport and Yang (BHKRY) for the special case $U(1,1)$ not considered ther. The main idea is to embed a $U(1, 1)$ Shimura curve to many $U(n-1, 1)$ Shimura varieties for big $n$ , andd prove a precise pullback formula ofther generating series of arithmetic divisors. Afterwards, we uise the modularity result of BHKRY together iwth the existince of non-vanishing of clasical theta series at any given point inhten upper half plane to proovehten modularity result on $U(1, 1)$ Shimura curves.

Answer:

This paper is a complement of the modularity result of Bruinier, Howard, Kudla, Rapoport and Yang (BHKRY) for the special case $U(1,1)$ not considered there. The main idea is to embed a $U(1, 1)$ Shimura curve to many $U(n-1, 1)$ Shimura varieties for big $n$ , and prove a precise pullback formula of the generating series of arithmetic divisors. Afterwards, we use the modularity result of BHKRY together with the existence of non-vanishing of classical theta series at any given point in the upper half plane to prove the modularity result on $U(1, 1)$ Shimura curves.

$\texttt{web\textunderscore of\textunderscore lies\textunderscore v2}$

50 headache-inducing (procedurally-generated, I assume) logic puzzles about people who either lie or tell the truth, revealed indirectly through their locations, and the claims of others, which may be truths or lies.

Example:

In this question, assume each person either always tells the truth or always lies.

The person at the theater says the person at the ice skating rink tells the truth. The person at the gym tells the truth. Beatriz is at the gym. The person at the ice skating rink thinks their friend is lying. Grace is at the campground. Hiroshi is at the theater. Emily is at the farm. The person at the campground says the person at the observatory lies. The person at the botanical garden tells the truth. The person at the cafe says the person at the campground lies. The person at the farm lies. Priya is at the park. Maya is at the library. The person at the ice skating rink says the person at the city hall tells the truth. Charlie is at the cafe. The person at the park tells the truth. The person at the ice skating rink saw a firetruck. The person at the beach says the person at the theater lies. Nadia is at the ice skating rink. Ethan is at the observatory. The person at the campground lies. Max is at the museum. Ayaan is at the hotel. Jake is at the city hall. Jaxon is at the skate park. Luna is at the beach. Kehinde is at the train station. The person at the campground saw a firetruck. The person at the museum says the person at the theater lies. Olivia is at the botanical garden. The person at the theater says the person at the train station lies. The person at the skate park lies. The person at the ice skating rink says the person at the library tells the truth. The person at the ice skating rink says the person at the campground tells the truth. The person at the hotel says the person at the ice skating rink lies.

Does the person at the theater tell the truth?

Does the person at the ice skating rink tell the truth?

Does the person at the campground tell the truth?

Think step by step, and then put your answer in bold as a list of three words, yes or no (for example, yes, no, yes). If you don’t know, guess.

Answer:

no, no, no

Yes, all 50 of them are like this.

$\texttt{zebra\textunderscore puzzle}$

50 logic puzzles about determining an attribute of a person based on a series of relational statements.

Example:

There are 3 people standing in a line numbered 1 through 3 in a left to right order.

Each person has a set of attributes: Beverage, Transport, Food.

The attributes have the following possible values:

Beverage: cola, coffee, iced-tea

Transport: motorbike, quad-bike, bike

Food: tomato, zucchini, asparagus

and exactly one person in the line has a given value for an attribute.

Given the following premises about the line of people:

the person that likes tomato is on the immediate right of the person who drinks cola

the person who drinks coffee is somewhere to the right of the person who drinks iced-tea

the person that travels by motorbike is not anywhere to the right of the person who drinks cola

the person that travels by bike is on the immediate right of the person that travels by quad-bike

the person that likes zucchini is on the far right

Answer the following question:

What food does the person that travels by motorbike like? Return your answer as a single word, in the following format: ***X***, where X is the answer.

Answer:

asparagus

LiveBench: An Overview

2024/08/08

average\texttt{average}average (unofficial)

AMPS_Hard\texttt{AMPS\textunderscore Hard}AMPS_Hard

Example:

Answer:

coding_completion\texttt{coding\textunderscore completion}coding_completion

Example (LeetCode 2962, Medium difficulty):

Answer:

connections\texttt{connections}connections

Example:

Answer:

cta\texttt{cta}cta

Example:

Answer:

LCB_generation\texttt{LCB\textunderscore generation}LCB_generation

Example (ABC340_A):

Answer:

math_comp\texttt{math\textunderscore comp}math_comp

Example:

Answer:

olympiad\texttt{olympiad}olympiad

Example (from USAMO):

Answer:

paraphrase\texttt{paraphrase}paraphrase

Example:

Answer:

plot_unscrambling\texttt{plot\textunderscore unscrambling}plot_unscrambling

Example:

Answer:

simplify\texttt{simplify}simplify

Example:

Answer:

spatial\texttt{spatial}spatial

Example:

Answer:

story_generation\texttt{story\textunderscore generation}story_generation

Example:

Answer:

summarize\texttt{summarize}summarize

Example:

Answer:

tablejoin\texttt{tablejoin}tablejoin

Example:

Answer:

tablereformat\texttt{tablereformat}tablereformat

Example:

Answer (prettified JSON):

typos\texttt{typos}typos

Example:

Answer:

web_of_lies_v2\texttt{web\textunderscore of\textunderscore lies\textunderscore v2}web_of_lies_v2

Example:

Answer:

zebra_puzzle\texttt{zebra\textunderscore puzzle}zebra_puzzle

Example:

Answer:

$\texttt{average}$ (unofficial)

$\texttt{AMPS\textunderscore Hard}$

$\texttt{coding\textunderscore completion}$

$\texttt{connections}$

$\texttt{cta}$

$\texttt{LCB\textunderscore generation}$

$\texttt{math\textunderscore comp}$

$\texttt{olympiad}$

$\texttt{paraphrase}$

$\texttt{plot\textunderscore unscrambling}$

$\texttt{simplify}$

$\texttt{spatial}$

$\texttt{story\textunderscore generation}$

$\texttt{summarize}$

$\texttt{tablejoin}$

$\texttt{tablereformat}$

$\texttt{typos}$

$\texttt{web\textunderscore of\textunderscore lies\textunderscore v2}$

$\texttt{zebra\textunderscore puzzle}$