Jump to content

How to store a trie

Wictorian
26 minutes ago, Wictorian said:

a trie

A what? 

Main System (Byarlant): Ryzen 7 5800X | Asus B550-Creator ProArt | EK 240mm Basic AIO | 16GB G.Skill DDR4 3200MT/s CAS-14 | XFX Speedster SWFT 210 RX 6600 | Samsung 990 PRO 2TB / Samsung 960 PRO 512GB / 4× Crucial MX500 2TB (RAID-0) | Corsair RM750X | a 10G NIC (pending) | Inateck USB 3.0 Card | Hyte Y60 Case | Dell U3415W Monitor | Keychron K4 Brown (white backlight)

 

Laptop (Narrative): Lenovo Flex 5 81X20005US | Ryzen 5 4500U | 16GB RAM (soldered) | Vega 6 Graphics | SKHynix P31 1TB NVMe SSD | Intel AX200 Wifi (all-around awesome machine)

 

Proxmox Server (Veda): Ryzen 7 3800XT | AsRock Rack X470D4U | Corsair H80i v2 | 64GB Micron DDR4 ECC 3200MT/s | 4x 10TB WD Whites / 4x 14TB Seagate Exos / 2× Samsung PM963a 960GB SSD | Seasonic Prime Fanless 500W | Intel X540-T2 10G NIC | LSI 9207-8i HBA | Fractal Design Node 804 Case (side panels swapped to show off drives) | VMs: TrueNAS Scale; Ubuntu Server (PiHole/PiVPN/NGINX?); Windows 10 Pro; Ubuntu Server (Apache/MySQL)


Media Center/Video Capture (Jesta Cannon): Ryzen 5 1600X | ASRock B450M Pro4 R2.0 | Noctua NH-L12S | 16GB Crucial DDR4 3200MT/s CAS-22 | EVGA GTX750Ti SC | UMIS NVMe SSD 256GB / TEAMGROUP MS30 1TB | Corsair CX450M | Viewcast Osprey 260e Video Capture | Mellanox ConnectX-2 10G NIC | LG UH12NS30 BD-ROM | Silverstone Sugo SG-11 Case | Sony XR65A80K

 

Camera: Sony ɑ7II w/ Meike Grip | Sony SEL24240 | Samyang 35mm ƒ/2.8 | Sony SEL50F18F | Sony SEL2870 (kit lens) | PNY Elite Perfomance 512GB SDXC card

 

Network:

Spoiler
                           ┌─────────────── Office/Rack ────────────────────────────────────────────────────────────────────────────┐
Google Fiber Webpass ────── UniFi Security Gateway ─── UniFi Switch 8-60W ─┬─ UniFi Switch Flex XG ═╦═ Veda (Proxmox Virtual Switch)
(500Mbps↑/500Mbps↓)                             UniFi CloudKey Gen2 (PoE) ─┴─ Veda (IPMI)           ╠═ Veda-NAS (HW Passthrough NIC)
╔═══════════════════════════════════════════════════════════════════════════════════════════════════╩═ Narrative (Asus USB 2.5G NIC)
║ ┌────── Closet ──────┐   ┌─────────────── Bedroom ──────────────────────────────────────────────────────┐
╚═ UniFi Switch Flex XG ═╤═ UniFi Switch Flex XG ═╦═ Byarlant
   (PoE)                 │                        ╠═ Narrative (Cable Matters USB-PD 2.5G Ethernet Dongle)
                         │                        ╚═ Jesta Cannon*
                         │ ┌─────────────── Media Center ──────────────────────────────────┐
Notes:                   └─ UniFi Switch 8 ─────────┬─ UniFi Access Point nanoHD (PoE)
═══ is Multi-Gigabit                                ├─ Sony Playstation 4 
─── is Gigabit                                      ├─ Pioneer VSX-S520
* = cable passed to Bedroom from Media Center       ├─ Sony XR65A80K (Google TV)
** = cable passed from Media Center to Bedroom      └─ Work Laptop** (Startech USB-PD Dock)

Retired/Other:

Spoiler

Laptop (Rozen-Zulu): Sony VAIO VPCF13WFX | Core i7-740QM | 8GB Patriot DDR3 | GT 425M | Samsung 850EVO 250GB SSD | Blu-ray Drive | Intel 7260 Wifi (lived a good life, retired with honor)

Testbed/Old Desktop (Kshatriya): Xeon X5470 @ 4.0GHz | ZALMAN CNPS9500 | Gigabyte EP45-UD3L | 8GB Nanya DDR2 400MHz | XFX HD6870 DD | OCZ Vertex 3 Max-IOPS 120GB | Corsair CX430M | HooToo USB 3.0 PCIe Card | Osprey 230 Video Capture | NZXT H230 Case

TrueNAS Server (La Vie en Rose): Xeon E3-1241v3 | Supermicro X10SLL-F | Corsair H60 | 32GB Micron DDR3L ECC 1600MHz | 1x Kingston 16GB SSD / Crucial MX500 500GB

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, AbydosOne said:

A what? 

I assume this: https://en.m.wikipedia.org/wiki/Trie

When you're working with such recursive data structures, you might want to also learn about Common Table Expressions (CTE)

 

Since this is basically a tree, a table with a key, some value(s) and a reference to a key should suffice.

~edit:

trie {
  id: int,
  parentId: int,
  …
  <values>
}

 

image.png.523933cada16ccc09bebb7b4e2d0a10f.png

 

For example this trie could be stored as:

 Id | ParentId | Value
  0 |     null | 
  1 |        0 | t
  2 |        0 | i
  3 |        0 | A
  4 |        1 | to
  5 |        1 | te
  6 |        2 | in
  7 |        5 | tea
  8 |        5 | ted
  9 |        5 | ten
 10 |        6 | inn

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, Eigenvektor said:

Oh, there's a name for that concept... surprised I haven't heard it before...

 

36 minutes ago, Eigenvektor said:

For example this trie could be stored as:

Couldn't you iterate up the parent IDs and store only one character at each node? Or maybe I'm missing the point of a trie (I just like iterating up pointer trails for some reason, so that's how I end up imagining algorithms).

 Id | ParentId | Value
  0 |     null | 
  1 |        0 | t
  2 |        0 | i
  3 |        0 | A
  4 |        1 | o
  5 |        1 | e
  6 |        2 | n
  7 |        5 | a
  8 |        5 | d
  9 |        5 | n
 10 |        6 | n

Main System (Byarlant): Ryzen 7 5800X | Asus B550-Creator ProArt | EK 240mm Basic AIO | 16GB G.Skill DDR4 3200MT/s CAS-14 | XFX Speedster SWFT 210 RX 6600 | Samsung 990 PRO 2TB / Samsung 960 PRO 512GB / 4× Crucial MX500 2TB (RAID-0) | Corsair RM750X | a 10G NIC (pending) | Inateck USB 3.0 Card | Hyte Y60 Case | Dell U3415W Monitor | Keychron K4 Brown (white backlight)

 

Laptop (Narrative): Lenovo Flex 5 81X20005US | Ryzen 5 4500U | 16GB RAM (soldered) | Vega 6 Graphics | SKHynix P31 1TB NVMe SSD | Intel AX200 Wifi (all-around awesome machine)

 

Proxmox Server (Veda): Ryzen 7 3800XT | AsRock Rack X470D4U | Corsair H80i v2 | 64GB Micron DDR4 ECC 3200MT/s | 4x 10TB WD Whites / 4x 14TB Seagate Exos / 2× Samsung PM963a 960GB SSD | Seasonic Prime Fanless 500W | Intel X540-T2 10G NIC | LSI 9207-8i HBA | Fractal Design Node 804 Case (side panels swapped to show off drives) | VMs: TrueNAS Scale; Ubuntu Server (PiHole/PiVPN/NGINX?); Windows 10 Pro; Ubuntu Server (Apache/MySQL)


Media Center/Video Capture (Jesta Cannon): Ryzen 5 1600X | ASRock B450M Pro4 R2.0 | Noctua NH-L12S | 16GB Crucial DDR4 3200MT/s CAS-22 | EVGA GTX750Ti SC | UMIS NVMe SSD 256GB / TEAMGROUP MS30 1TB | Corsair CX450M | Viewcast Osprey 260e Video Capture | Mellanox ConnectX-2 10G NIC | LG UH12NS30 BD-ROM | Silverstone Sugo SG-11 Case | Sony XR65A80K

 

Camera: Sony ɑ7II w/ Meike Grip | Sony SEL24240 | Samyang 35mm ƒ/2.8 | Sony SEL50F18F | Sony SEL2870 (kit lens) | PNY Elite Perfomance 512GB SDXC card

 

Network:

Spoiler
                           ┌─────────────── Office/Rack ────────────────────────────────────────────────────────────────────────────┐
Google Fiber Webpass ────── UniFi Security Gateway ─── UniFi Switch 8-60W ─┬─ UniFi Switch Flex XG ═╦═ Veda (Proxmox Virtual Switch)
(500Mbps↑/500Mbps↓)                             UniFi CloudKey Gen2 (PoE) ─┴─ Veda (IPMI)           ╠═ Veda-NAS (HW Passthrough NIC)
╔═══════════════════════════════════════════════════════════════════════════════════════════════════╩═ Narrative (Asus USB 2.5G NIC)
║ ┌────── Closet ──────┐   ┌─────────────── Bedroom ──────────────────────────────────────────────────────┐
╚═ UniFi Switch Flex XG ═╤═ UniFi Switch Flex XG ═╦═ Byarlant
   (PoE)                 │                        ╠═ Narrative (Cable Matters USB-PD 2.5G Ethernet Dongle)
                         │                        ╚═ Jesta Cannon*
                         │ ┌─────────────── Media Center ──────────────────────────────────┐
Notes:                   └─ UniFi Switch 8 ─────────┬─ UniFi Access Point nanoHD (PoE)
═══ is Multi-Gigabit                                ├─ Sony Playstation 4 
─── is Gigabit                                      ├─ Pioneer VSX-S520
* = cable passed to Bedroom from Media Center       ├─ Sony XR65A80K (Google TV)
** = cable passed from Media Center to Bedroom      └─ Work Laptop** (Startech USB-PD Dock)

Retired/Other:

Spoiler

Laptop (Rozen-Zulu): Sony VAIO VPCF13WFX | Core i7-740QM | 8GB Patriot DDR3 | GT 425M | Samsung 850EVO 250GB SSD | Blu-ray Drive | Intel 7260 Wifi (lived a good life, retired with honor)

Testbed/Old Desktop (Kshatriya): Xeon X5470 @ 4.0GHz | ZALMAN CNPS9500 | Gigabyte EP45-UD3L | 8GB Nanya DDR2 400MHz | XFX HD6870 DD | OCZ Vertex 3 Max-IOPS 120GB | Corsair CX430M | HooToo USB 3.0 PCIe Card | Osprey 230 Video Capture | NZXT H230 Case

TrueNAS Server (La Vie en Rose): Xeon E3-1241v3 | Supermicro X10SLL-F | Corsair H60 | 32GB Micron DDR3L ECC 1600MHz | 1x Kingston 16GB SSD / Crucial MX500 500GB

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, AbydosOne said:

Oh, there's a name for that concept... surprised I haven't heard it before...

 

Couldn't you iterate up the parent IDs and store only one character at each node?

Sure, you could do that. But depending on how you're using that thing, it could be much faster to have what you need right away rather than recursing through the tree. It's essentially a space vs speed tradeoff.

 

~edit: Let's say you need to check if an entry for "tea" already exists, in my example that could be done with a simple select, in your example you'd need to recurse through the entries, starting with "t", to see if you can form that combination.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Wictorian said:

How should one store a trie in a database?

Is there any context into why you are trying to insert it into a database?

 

I mean there could be a few ways to store it.  If you are ultimately trying to store it as a dictionary, might as well just store it the way a database would work (in a table)...and then rebuild the trie on demand if you need it.

 

e.g. having a table with keys dictionary_id, word
So then you can do a select word from dictionary where dictionary_id = 123;  To get the entire dictionary.  Then iterate through creating the trie.

 

I mean you could I guess use a binary datatype, but I mean ultimately it comes back to needing to still re-import it most likely.

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Eigenvektor said:

Sure, you could do that. But depending on how you're using that thing, it could be much faster to have what you need right away rather than recursing through the tree. It's essentially a space vs speed tradeoff.

This is a really good point, creating a recursive will only optimize the storage space, not the query performance. Recursive query is extremely costly on database. If you have a thousand sub level that could be many tens of thousand of query while some speed optimized structure can do it in a single query.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, wanderingfool2 said:

Is there any context into why you are trying to insert it into a database?

 

I mean there could be a few ways to store it.  If you are ultimately trying to store it as a dictionary, might as well just store it the way a database would work (in a table)...and then rebuild the trie on demand if you need it.

 

e.g. having a table with keys dictionary_id, word
So then you can do a select word from dictionary where dictionary_id = 123;  To get the entire dictionary.  Then iterate through creating the trie.

 

I mean you could I guess use a binary datatype, but I mean ultimately it comes back to needing to still re-import it most likely.

what I mean by database is some dort of file on the hard drive so I dont have to reconstruct it every time I run a script.

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, Wictorian said:

what I mean by database is some dort of file on the hard drive so I dont have to reconstruct it every time I run a script.

Well I mean insertion into a Trie is pretty lightweight for inserts...it's an O(n) efficiency.

 

I mean you could always get tricky by loading all the data in some set structure so that you can just read in the file and have it in the layout you need [would vary by language].

 

Ultimately if your thought was storing it as a flat file/similar, I would just reconstruct it each time.  Unless there is a bunch of words/phrases that is really slowing it down (or calling it to load a bunch of times) then you might want to consider different methods.

 

You could do things like loading in to memory (easier in things like c/C++ kind of thing)...or other ways that I could think of would really be a lot more effort to implement than just adding in again...ie the performance gains from doing such a thing, is likely not worth the time it would take to get it optimized as such.

 

 

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Wictorian said:

what I mean by database is some dort of file on the hard drive so I dont have to reconstruct it every time I run a script.

While at the end of the day a database is technically "just a (bunch of) file(s)", it really isn't. You should normally distinguish between the two.

 

A database is optimized for fast retrieval (e.g. indexes) of individual rows and has its own query language. This can allow you to quickly load some subset of the data without having to keep everything in memory. Depending on how you're using the data, a flat file may "good enough", but as I pointed out above, databases support CTEs which can make retrieval of recursive data much more efficient.

 

In any case, you can still use the same overall structure for a flat file and e.g. use a simple CSV. You can use its columns and rows pretty much the same way as you would a single database table.

 

As @wanderingfool2 said, consider reconstructing it instead. If you're loading everything into memory on startup, loading it from disk is only really worth it if that is faster than reconstructing it on the fly. If you only need a subset and there's a good way to query what you need, then an actual database can start to make sense.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, Wictorian said:

what I mean by database is some dort of file on the hard drive so I dont have to reconstruct it every time I run a script.

If you need to load or modify portion on that data a database is what you need. If you will always and only use that tree when fully loaded you want a simple serialization.

Link to comment
Share on other sites

Link to post
Share on other sites

If you want to store it in a database you can use: Neo4J which is a graph database letting you store "nodes" and "links".

 

Alternativly and probably simpler, you can use graphviz which I use the python library (pypi) to parse graphs / construct and parse etc. These can be saved to ".dot" files and uses the "DOT Language".

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×