Thursday, May 02, 2024

SQL Essential Training - LinkedIn

  • Datum - piece of information
  • Data is plural of datum. Data are piece of information - text, images or video.
  • Database - collection of data. Organized in many ways - tables. 
  • Tables - rows, and columns. Excel spreadsheet.
  • Col specifies attribute of that data.

An RDBMS (Relational Database Management System) is a program used to create, update, and manage relational databases. In a relational database, data is organized into tables, with each table containing rows (also known as records or tuples) and columns (attributes). Here are some key points about RDBMS:

  1. Table Structure:
    • An RDBMS structures information in tables, rows, and columns.
    • Each table represents a specific type of data (e.g., Customers, Orders).
    • Columns define the attributes (e.g., Customer ID, Order Date).
    • Rows contain actual data entries (e.g., individual customer records).
  2. Relationships:
    • RDBMS allows establishing relationships between tables using common attributes.
    • Instead of hierarchical structures, data is stored in related tables.
    • Primary keys uniquely identify rows, and foreign keys link related data.
  3. Example:
    • Consider a Customer table and an Order table:
      • Customer Table:
        • Customer ID (primary key)
        • Customer name
        • Billing address
        • Shipping address
      • Order Table:
        • Order ID (primary key)
        • Customer ID (foreign key)
        • Order date
        • Shipping date
        • Order status
    • By linking the Customer ID in both tables, we establish a relationship.
  4. Well-Known RDBMSs:
    • Some popular RDBMSs include MySQL, PostgreSQL, MariaDB, Microsoft SQL Server, and Oracle Database.

Here are some of the most popular Relational Database Management Systems (RDBMS):

  1. Oracle: As of September 2023, Oracle is the most popular RDBMS in the world, with a ranking score of 1240.88. It also holds the top position overall among all DBMS11.

  2. MySQL: MySQL is widely used and known for its open-source nature. It’s a popular choice for web applications and small to medium-sized databases.

  3. Microsoft SQL Server: Developed by Microsoft, SQL Server is commonly used in enterprise environments. It offers robust features, scalability, and integration with other Microsoft products.

  4. PostgreSQL: PostgreSQL is an open-source RDBMS known for its extensibility, ACID compliance, and support for advanced data types. It’s popular among developers and data professionals.

  5. IBM DB2: IBM DB2 is an enterprise-grade RDBMS with features like high availability, security, and scalability. It’s commonly used in large organizations.

  6. Microsoft Access: While not as powerful as the others, Microsoft Access is widely used for small-scale databases and desktop applications.

  7. SQLite: SQLite is a lightweight, embedded RDBMS often used in mobile apps and small projects.



WSDA Music


Company management wants to know what can we learn from the data? is there any useful info about sales, cust demographics, any ways company can improve / expand sales.

SQLite

SQLite is not a lighter version of SQL itself; rather, it is a lightweight relational database management system (RDBMS) that adheres to SQL specifications. Let’s explore why it’s named as such:


  1. Lightweight and Embedded:

    • SQLite focuses on providing a powerful SQL-compatible database without overheads or dependencies.
    • As the name implies, it’s a lightweight solution that can run on almost anything that supports C and persistent file storage.
    • Unlike traditional database systems that require a separate server process, SQLite is serverless and integrates directly into the application it serves.
  2. Key Features:

    • Embeddable: SQLite is embedded within the application, eliminating the need for a separate database server.
    • SQL Compatibility: Despite its lightweight nature, SQLite supports a vast majority of SQL standard features, making it robust enough for various applications.
    • File-Based: It operates directly on files, making it easy to manage and distribute.
  3. Use Cases:

    • Mobile Devices: SQLite is commonly used in mobile devices (such as Android and iOS) due to its small footprint and efficient storage.
    • Embedded Systems: It’s also popular in embedded systems, IoT devices, and desktop applications.
    • Testing and Prototyping: Developers often use SQLite for testing, prototyping, and small-scale projects.



Functions of each tab in DB Browser for SQLite:

  1. Database Structure:

    • In this tab, you can:
      • Create new database tables.
      • List existing database tables.
      • Delete database tables.
      • Define the structure of your database by specifying table names, columns, and their data types.
  2. Browse Data:

    • Here, you can:
      • View the actual data stored in your tables.
      • Browse through rows and columns.
      • Add new rows or modify existing data.
      • Essentially, it allows you to interact with the data in your database.
  3. Edit Pragmas:

    • The Edit Pragmas tab deals with system-wide parameters (pragmas) related to SQLite.
    • Pragmas are special commands that control various aspects of SQLite behavior.
    • You won’t typically need to change these settings unless you have specific requirements.
  4. Execute SQL:

    • This tab allows you to:
      • Write and execute SQL queries directly.
      • Query your database for specific information.
      • Inspect query results.
      • Perform operations like SELECT, INSERT, UPDATE, and DELETE.

Aliases



Operator Types



  • Select * from invoice where total in (1.98, 3.96)
  • Select * from Invoice where BillingCity in('Brussels', 'Orlando', 'Paris')
  • Select * from Invoice where BillingCity Like ('b%')
  • select * from invoice where total>1.98 AND (BillingCity like 'p%' OR BillingCity like 'd%')
  • Case statement
SELECT *,
CASE
When total < 2 THEN 'Baseline Purchase'
When total BETWEEN 2 and 6.99 Then 'Low Purchase'
When total between 7 and 15 then 'Target purchase'
Else 'Top performer'
END AS PurchaseType
From Invoice;

 


Filtering only the top performers by including the where clause....


JOINS

Getting data from 2 or more tables in a single SQL statement

Full list of customers (firstname and lastname) against all the invoices generated against that customer.

Entity-Relationship diagram for Invoice and Customer relationship.  


Select * from Invoice
INNER JOIN
Customer
ON
Invoice.CustomerID = Customer.CustomerID
order by customer.CustomerId

Aliasing

Select c.CustomerId, c.LastName, c.FirstName, i.InvoiceId, i.InvoiceDate from Invoice as i
INNER JOIN
Customer as c
ON
i.CustomerID = c.CustomerID
order by c.CustomerId


Discrepancies between tables are handled with different join types


  • Customer with customerid 6 cannot be found in the customer table.
  • Customers with customerid 1 and 5  do not have entries in the invoice table. 
Inner Join

  • Inner join returns only matchting records.
  • Any umatched data from either tables is ignored.

Inner join will ignore customer 6 and customer 1 and 5. 






Sunday, April 28, 2024

RAG - Retrieval Augmented Generation AI

Courtesy: Databricks.com

Retrieval augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data. This is done by retrieving data/documents relevant to a question or task and providing them as context for the LLM. RAG has shown success in support chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.

RAG is the right place to start, being easy and possibly entirely sufficient for some use cases. Fine-tuning is most appropriate in a different situation, when one wants the LLM's behavior to change, or to learn a different "language." These are not mutually exclusive. As a future step, it's possible to consider fine-tuning a model to better understand domain language and the desired output form — and also use RAG to improve the quality and relevance of the response.


When I want to customize my LLM with data, what are all the options and which method is the best (prompt engineering vs. RAG vs. fine-tune vs. pretrain)?

There are four architectural patterns to consider when customizing an LLM application with your organization's data. These techniques are outlined below and are not mutually exclusive. Rather, they can (and should) be combined to take advantage of the strengths of each.

Friday, April 26, 2024

LLM / SLM Parameters

What do we understand by LLM or SLM Parameters?

**Parameters** in deep learning, including language models, are adjustable values that control the behavior of neural networks. These parameters are learned during training and determine how the model processes input data.

In LLMs and SLMs, parameters typically include:

1. **Weight matrices**: These matrices contain the numerical values that are multiplied by input vectors to produce output activations.

2. **Bias terms**: These are additive constants added to the weighted sum of inputs to adjust the activation function's output.

3. **Learned embeddings**: These are fixed-size vector representations of words, phrases, or tokens learned during training.

The number and complexity of these parameters directly impact the model's performance, accuracy, and computational requirements. More parameters often allow for more nuanced learning and better representation of complex linguistic patterns, but also increase the risk of overfitting and computational costs.

In the context of LLMs, having **billions** of parameters means that the model has an enormous number of adjustable values, allowing it to capture subtle relationships between words, contexts, and meanings. This complexity enables LLMs to achieve impressive results in tasks like language translation, question answering, and text generation.

Conversely, SLMs typically have fewer parameters (often in the tens or hundreds of thousands), which makes them more efficient but also less capable of capturing complex linguistic patterns.

SQL: A Practical Introduction for Querying Databases (IBM)

This is a refresher course. I already have a diploma in Oracle RDBMS.

Structured Query Language

  • A language for relational databases
  • Used to query data
Data

Collection of facts (words, numbers), pictures. 
  • Data needs to be secured, and accessed when required.
  • The above can be achieved using a Database.
Database
  • Is a repository of data / it is a program that stores data.
  • Provides functionality for adding, modifying and querying data.
Different kinds of datbases
  • Relational database
    • Data is stored in tabular form - columns & rows
    • Like in spreadsheet
    • Cols - has properties about each item - such as last name, first name, email address, etc.
    • Table is a collection of related things - example employees, salary, etc.
    • In a relational database, you can form relationship between tables. 
    • Emp table, Salary table, etc. etc.
    • RDBMS
  • DMBS
    • Database Management System - set of software tools for the data in the database is called DBMS.
    • Database is a repository of data  
    • Terms database, database server, database system, data server, DBMS are all used interhangeably.
RDBMS
  • MySQL, DB2, Oracle
SQL
  • Create table
  • Insert data to a table
  • Select statement to see the data in a table
  • Update data in a table
  • Delete data from the table

Types of SQl Statements

- Data Definition Language
- Data Manipulation Language
- Data Control Language
- Transaction Control Language
Image courtesy: Geeksforgeeks.com

Results coming in from an SQL query is called a table or a Result Set.

Define Table structure of an existing table

sp_help BOOK;



DML - Read and Modify data

Select Statement
  • Select col1, col2, col3 from FilmLocations;
  • Select * from FilmLocations;
  • Select count(*) from FilmLocations;
  • Select DISTINCT(Directors) from FilmLocations;
  • Select * from FilmLocations LIMIT 25;
  • SELECT DISTINCT Title FROM FilmLocations WHERE ReleaseYear=2015 LIMIT 3 OFFSET 5; Retrieve the next 3 film names distinctly after first 5 films released in 2015.
INSERT Statement
  • Insert into TableName ColumnName1, ColumnName2, ColumnNameN values <Value1>, <Value2>, <ValueN>;
  • Inserting 1 row at a time:
    • Insert into AUTHOR (Author_ID, Lastname, Firstname, Email, City, Country) values ('A1', 'Chong', 'Raul', 'rfc@ibm.com', 'Toronto', 'CA');
  • Multiple rows can be inserted.
    • Insert into Author (Author_ID, Lastname, Firstname, Email, City, Country) 
                    values
                    
                    ('A1', 'Chong', 'Raul', 'rfc@ibm.com', 'Toronto', 'CA')
                    ('A2', 'Ahuga', 'Rav', 'ra@ibm.com', 'Toronto', 'CA')

                     ); 
  • Insert into Instructor(ins_id, lastname, firstname, city, country) VALUES (8, 'Ryan', 'Steve', 'Barlby', 'GB'), (9, 'Sannareddy', 'Ramesh', 'Hyderabad', 'IN');   

Entity name - Author
Entity Attibutes - Author_ID, Country, City, Email, FirstName, LastName.

Update statement
  • Alter data in a table using UPDATE statement.
  • Update TableName SET ColumnName1=Value1 WHERE [condition];
  • Example, Update AUTHOR SET Lastname='KATTA', Firstname='Lakshmi' WHERE AUTHOR_Id='A2';
  • If where clause is not specified all rows will be updated.

Delete statement
  • Read and modify data.
  • Delete from TableName WHERE <condition>;
  • Delete from AUTHOR WHERE Author_ID in (a1, a2); both the rows will be deleted.
  • If where clause is not specified all rows will be deleted.


Relational Databases

1. RELATIONAL MODEL



2. E-R MODEL (Entity - Relationship Model)

Book - entity (drawn as rectanges in E-R diagrams)
Attribute - Title, Description, etc. (drawn as ovals in E-R diagrams)


  • Entity Book becomes a table in a database. 
  • Attributes become the columns.


PK uniquely identifies each Tuple / Row in a table. 



Foreign Keys are Primary Keys defined in other tables. They create link between the tables. In the above example, Author ID is a primary key in Author table, but in the Author List table, it is a Foreign Key linking the table Author List to the table Author.

Common Datatypes include Characters (VARCHAR), Numbers, and Date/Times.

DDL VS DML

  • DDL - define, change and drop data. 
    • CREATE
    • ALTER
    • TRUNCATE
    • DROP
  • DML - read and modify data in tables. 
    • Also known as CRUD operations.
    • INSERT
    • SELECT
    • UPDATE
    • DELETE
CREATE



Create table AUTHOR (

AUTHOR_ID  Int PRIMARY KEY NOT NULL,
Lastname varchar (30) NOT NULL,
firstname varchar (30) NOT NULL,
email varchar (30),
city varchar (30),
country varchar (30)

)

select * from AUTHOR

Create table BOOK (

BOOK_ID  Int PRIMARY KEY NOT NULL,
Title varchar (30),
Edition Int,
BYear Int,
Price Decimal (5,2),
ISBN varchar (6),
Pages Int,
Aisle Int,
Description varchar (80)
)

Select * from BOOK;


Create table BORROWER (

BORROWER_ID  Int PRIMARY KEY NOT NULL,
Lastname varchar (30) NOT NULL,
firstname varchar (30) NOT NULL,
email varchar (30),
Phone Int,
Address varchar (80),
City varchar (30),
Country varchar (30),
BRDescription varchar (80)

)

select * from BORROWER;


Create table AUTHOR_LIST (
AUTHOR_ID Int Foreign Key REFERENCES AUTHOR(AUTHOR_ID),
BOOK_ID Int Foreign Key REFERENCES BOOK(BOOK_ID),
AuthRole varchar (30)

)

Create table COPY (
COPY_ID Int PRIMARY KEY NOT NULL,
BOOK_ID Int Foreign Key REFERENCES BOOK(BOOK_ID),
STATUS varchar (6)
)

ALTER, DROP, TRUNCATE Tables

  1. Alter
    1. Add / Remove columns
    2. Modify datatype of a col
    3. Add / remove keys
    4. Add / remove constraints
            Alter table <table_name>
            Add col <datatype>,
            Add col2<datatype>;

Add cols

Alter table AUTHOR 
Add Qualification varchar(30);

Delete cols           

           Alter table BOOK
           drop COLUMN Qual;

Modify datatype of a col

ALTER TABLE AUTHOR
ALTER COLUMN Qualification VARCHAR(4); //Instead of Varchar(30) or you can modify datatype to another (string to int, etc.). 

Constraints

Create table myKeyConstraints (

sno int PRIMARY KEY NOT NULL,

firstname varchar(30),

lastname varchar(30)

);

sp_help mykeyconstraints; // Check the constraint name highlighted below....


Drop constraint

alter table myKeyConstraints
drop constraint PK__myKeyCon__DDDF64469E1B9A37;

// Constraint is gone...


PRIMARY KEY = UNIQUE + NOT NULL








LLM Quality vs Size small language models

 Courtesy: Microsoft.com 




SQL Essential Training - LinkedIn

Datum - piece of information Data is plural of datum. Data are piece of information - text, images or video. Database - collection of data. ...