learning Binary Search Tree

MisterWhite · June 29, 2016

As I am not well acquainted with pointers, using them for BST is a bit tricky. Right now i'm looking at a tutorial on how to create a binary tree and how to use it.

BST.h

Spoiler



class BTS
{
private:
    struct node
    {
        int key;
        node* left;
        node* right;
    };

    node* root;

    void AddLeafPrivate(int key, node* Ptr);
    void PrintInOrderPrivate(node* Ptr);
    node* returnNodePrivate(int key, node* Ptr);

public:
    BTS();
    node* CreateLeaf(int key);
    void AddLeaf(int key);
    void PrintInOrder();
    node* returnNode(int key);


};

BST.cpp

Spoiler


#include <iostream>
#include <cstdlib>

#include "BTS.h"

using namespace std;

BTS::BTS()
{
    root = NULL;
}
BTS::node* BTS::CreateLeaf(int key)
{
    node* n= new node;
    n->key = key;
    n->left = NULL;
    n->right = NULL;

    return n;
}

void BTS::AddLeaf(int key)
{
    AddLeafPrivate(key,root);
}

void BTS:: AddLeafPrivate(int key, node* Ptr)
{
    if(root == NULL)
    {
        root = CreateLeaf(key);
    }
    else if(key< Ptr->key)
    {
        if(Ptr->left != NULL)
        {
            AddLeafPrivate(key, Ptr->left);
        }
        else
        {
            Ptr->left = CreateLeaf(key);
        }
    }
    else if(key> Ptr->key)
    {
        if(Ptr->right != NULL)
        {
            AddLeafPrivate(key, Ptr->right);
        }
        else
        {
            Ptr->right = CreateLeaf(key);
        }
    }
    else
    {
        cout<<"The key "<<key<<" has already been added"<<endl;
    }
}

void BTS::PrintInOrder()
{
    PrintInOrderPrivate(root);
}

void BTS::PrintInOrderPrivate(node* Ptr)
{
    if(root != NULL)
    {
        if(Ptr->left !=NULL)
        {
            PrintInOrderPrivate(Ptr->left);
        }
        cout<<Ptr->key<<" ";
        if(Ptr->right !=NULL)
        {
            PrintInOrderPrivate(Ptr->right);
        }
    }
    else
        cout<<"The Tree is empty\n";
}

BTS:: node* BTS::returnNode(int key)
{
    return returnNodePrivate(key,root);
}

BTS:: node* BTS::returnNodePrivate(int key, node* Ptr)
{
    if(Ptr !=NULL)
    {
        if(Ptr->key == key)
            return Ptr;
        else
        {
            if(key < Ptr->key)
                return returnNodePrivate(key, Ptr->left);
            else
                return returnNodePrivate(key, Ptr->right);
        }
    }
    else{
        return NULL;
    }
}

Main.cpp

Spoiler


#include <iostream>
#include <cstdlib>

#include "BTS.h"
using namespace std;

int main()
{
    int TreeKeys[16] = {50,76,21,4,32,64,15,52,14,100,83,2,70,87,80,3};
    BTS myTree;
  
    return 0;
}

class BTS
{
private:
    struct node
    {
        int key;
        node* left;
        node* right;
    };

    node* root;
}

1) Why struct node is created in the first place?

2) Why int key is in a struct if we do not keep track of them?

3) What happens here: node* left?

public:
    BTS();
    node* CreateLeaf(int key);

4) Have no clue how to call this node* CreateLeaf(int key); (pointer function?)

BTS::node* BTS::CreateLeaf(int key)
{
    node* n= new node;
    n->key = key;
    n->left = NULL;
    n->right = NULL;

    return n;
}

5)Sheesh, where to begin here.

5.1 why the double BTS:: ?

5.2 n->key = ... clarification here would also be nice. we point to key and assign its value?

Overall why pointers are used?

Yamoto42 · June 29, 2016

1) The nodes are the actual data structure. The class BTS is only a central, and safe, point of access. It can manage creation and cleanup, while keeping other classes from touching things they shouldn't be touching or don't need to know about.

2) You do keep track of the keys. The nodes are the actual data structure, and thusly where you keep track of them.

3) I wont get into balancing, but a binary search tree works by effectively splitting the sorted data list in half at each node. AT ANY NODE the left->key will be strictly less than this->key, and right->key will be strictly greater than this->key. Therefore you are effectively cutting the list in half and starting your search from the middle. Instead of searching the items one at a time, you are cutting the number of items left to search in half at every node. You will either eventually find an equal key meaning success, or a null pointer meaning the item does not exist. Instead of searching N item to find your element, you are searching log₂(N) items, a significantly smaller number of items to search.

4) node* BST::CreateLeaf(int key) is effectively nothing more than a constructor for a node.

5.1) the "::" operator declares the scope of the type, variable, or fucntion. struct node is part of the BTS class. It does not exist apart from the BTS class, and therefore must be referenced as part of BTS, hence the return type is BTS::node* being a pointer to the struct node declared within the class BTS. The function that returns the BTS::node* is also member of the BTS class, therefore it must also be scoped as such.

5.2) A pointer is an address, not a value. If you know another language that uses references, they are more or less (at a birds eye view) the same thing. At a lower level view, it has to do with where in memory the data actually resides.

The only way to dynamically allocate data (not at compile time) is in an area of memory called the heap. The only way to access this area of memory is through pointers, which store the actual memory address of the data. The class and structure then define the physical layout of the data in memory. When you declare a non-pointer, the data goes into another area called the stack, which is a much smaller place and is known at compile time, thus data can be treated as a value. Despite not being "data", the address is technically a value (32/64-bit unsigned integer). Therefore you must use the -> operator to treat it as an address and not a value. If ti was not technically a value, you couldn't read or assign it...


class Foo {
public:
	int Bar;
};

/* on the stack, direct access */
Foo myFoo;

/* on the heap, requires indirect access */
Foo* myFooToo = new Foo();

myFoo.Bar = 0;
myFooToo->Bar = 1

/* prints 0 */
cout << myfoo.Bar << endl;
/* prints 1 */
cout << myFooToo->Bar << endl;

/* myfooToo now points to the foo on the stack
   they are now truly the same foo */
myfooToo = &myfoo;
myFooToo->Bar = 2;

/* prints 2 ^__- */
cout << myFoo.Bar << endl;

That also contains a memory leak. The original myFooToo still exists, but since i reassigned it without deleting it, i have no way to recover its address to clean it up.

References in other language are actually pointers, but the languages cover up the memory management involved. Additionally, a pointer can point anywhere in memory, even the stack, or to another pointer...even to a executable function...but that's just a bonus.

6) Graphs (A tree a specific type of graph) are naturally recursive structures. Without pointers to dynamically create nodes, each node would create two nodes. Since each of those is a node, each of them would create two nodes...so on and so forth giving you a structure of infinite size. Pointers allow nodes to only be created as needed.

MisterWhite · June 29, 2016

35 minutes ago, Yamoto42 said:

-snip-

About the 3)

maybe my question was a bit wrong. struct node - why we use the same name "node" to create a pointer?

~~what is the purpose of "->" why not "." like usually structs members are accessed.~~

Yamoto42 · June 29, 2016

It isn't the same node. Each key has it's own node.

A tree is a unique type of graph, because each node is also the root of another, smaller tree. Searches always start at the global root node declared as part of the BTS class, and as such it is the only one we need to directly keep track of. That root node then keeps track of two different nodes, one with greater key and one with a lesser key. All keys in the sub-tree started by root->left will have lesser keys than root, and all nodes in the sub-tree root->right will have greater keys.

node* base = root;

while ( base != NULL ) {
	
	if (searchKey == base->key)
		return base; // Found it!
	
	else if (searchKey < base->key)
      base = base->left;  // implies ALL keys in left subtree are less
      
    else
      base = base->right; // implies ALL keys in right subtree are greater
      
}
      
// if we get here base == null
return base;

Unless I'm misinterpreting your question.

If you're talking about the structure being declared within the class, that's still just declaring what a BST::node is, not declaring that one exists. Being within the class definition just logically relates it to the class, and in C++ is really little more than naming, since you could also have a different type of struct called "node" outside of the class's scope. The definition defines how memory is layed out, not where it is: Specifically (for variables) as an offset from the address the object is located. It does not define an address, only the offset.

class BST {

public:
  struct node {
      int a;
      int b;
  };
};

struct node {
	double alpha;
	double beta;
};

In this example, node* and BST::node* are both valid types, but mean two entirely different things. Remember, pointers "technically" hold an address as an unsigned integer....so

node* myNode = new Node();
BST::node* myBstNode = new BST::node();


myNode->beta = 0;
// Compiler Translation
Got to address 0x?????????
Go forward 8 bytes
This is a double.  Assign it value 0.0.
// End translation

myBstNode->b = 0;
// Compiler Translation
Got to address 0x?????????
Go forward 4 bytes.
This is a 32-bit integer.  Assign it value 0.
// End translation

I think we can agree there is a large difference between a floating point value and an integer, or at least there's a difference between 4 and 8. It's again about naming. It has to do with telling the compiler what to do.

Even if, say the external definition was the same, the compiler would still consider them as different types. And ultimately that's what that is, its declaring a type: what something is, not that one exists.