Hacker News Top Stories with Summaries (June 29, 2023)
<style>
p {
font-size: 16px;
line-height: 1.6;
margin: 0;
padding: 10px;
}
h1 {
font-size: 24px;
font-weight: bold;
margin-top: 10px;
margin-bottom: 20px;
}
h2 {
font-size: 18px;
font-weight: bold;
margin-top: 10px;
margin-bottom: 5px;
}
ul {
padding-left: 20px;
}
li {
margin-bottom: 10px;
}
.summary {
margin-left: 20px;
margin-bottom: 20px;
}
</style>
<h1> Hacker News Top Stories</h1>
<p>Here are the top stories from Hacker News with summaries for June 29, 2023 :</p>
<div style="margin-bottom: 20px;">
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td style="padding-right: 10px;">
<div style="width: 200px; height: 100px; border-radius: 10px; overflow: hidden; background-image: url('https://hackernewstoemail.s3.us-east-2.amazonaws.com/hnd2'); background-size: cover; background-position: center;">
XGen-7B, a new 7B foundational model trained on up to 8K length for 1.5T tokens
Summary: Researchers have developed XGen-7B, a series of 7B long language models (LLMs) trained on up to 8K input sequence length for up to 1.5T tokens. XGen-7B achieves comparable or better results on standard NLP benchmarks compared to state-of-the-art open-source LLMs. The model also performs well in text and code tasks, such as MMLU and HumanEval. XGen-7B is designed to handle long sequences, making it suitable for applications like summarizing text, writing code, and predicting protein sequences.
<div style="margin-bottom: 20px;">
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td style="padding-right: 10px;">
<div style="width: 200px; height: 100px; border-radius: 10px; overflow: hidden; background-image: url('https://hackernewstoemail.s3.us-east-2.amazonaws.com/hnd2'); background-size: cover; background-position: center;">
OpenOrca: open source dataset and instruct-tuned LLMs
Summary: Eric Hartford announces OpenOrca, an open-source dataset and series of instruct-tuned language models inspired by Microsoft's GPT-4 Orca. With the help of an AI/ML engineering team, they created a dataset consisting of ~1 million FLANv2 augmented with GPT-4 completions and ~3.5 million FLANv2 augmented with GPT-3.5 completions. OpenOrca-LLaMA-13b is expected to release in mid-July 2023, with evaluation findings and dataset publication. The team is seeking GPU compute sponsors for training OpenOrca on various platforms.