Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add idb helper tool and Dirtree parser #7

Closed
wants to merge 35 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4e9b853
cargo init
rbran Aug 28, 2024
e5e5570
Merge branch 'main' of github.com:Vector35/idb-rs
rbran Sep 2, 2024
a4facb1
Merge branch 'main' of github.com:Vector35/idb-rs
rbran Sep 23, 2024
3c19fad
Merge branch 'main' of github.com:Vector35/idb-rs
rbran Sep 24, 2024
3c99f3e
add idb helper tool
rbran Sep 25, 2024
f35cc52
implement a address info and its dump tool
rbran Sep 26, 2024
504694a
implement the id0 dump tool
rbran Sep 26, 2024
2077f72
bump the version
rbran Sep 26, 2024
ee1889c
add a fileformat documentation stub
rbran Sep 26, 2024
1756d02
initial implementation of dirtree
rbran Sep 30, 2024
8c70bc4
improve dirtree parsing capabilities
rbran Oct 1, 2024
f8440e2
implement a few extra dirtrees
rbran Oct 1, 2024
666e3e5
unify the read id0 into a function
rbran Oct 1, 2024
758be09
fix dirtree index wrapping on 32bits
rbran Oct 1, 2024
51f6f48
include dirtree funcs to the dump-functions dump tool
rbran Oct 1, 2024
6240c1a
bump version to 0.1.2
rbran Oct 2, 2024
7b7c7c8
reutilize the comments enum on funcs parsing
rbran Oct 3, 2024
18c3879
add basic documentation on id0 section
rbran Oct 3, 2024
93551ff
allow out-of-order entries in dirtree
rbran Oct 3, 2024
528dd03
add and fix documentation
rbran Oct 3, 2024
6d4616a
bump version to 0.1.3
rbran Oct 3, 2024
2da9d02
update tests
rbran Oct 3, 2024
fdf6d0c
mark decompress functions as allow dead_code
rbran Oct 3, 2024
3611177
fix dirtree parsing for folder with many entries
rbran Oct 4, 2024
2197af2
Allow non UTF8 comments
rbran Oct 4, 2024
836e29b
implement an older version of dirtree
rbran Oct 4, 2024
1bb8036
allow unknown compiler on root info value
rbran Oct 7, 2024
3dd0c8b
implement til decompression
rbran Oct 7, 2024
ef33abc
impl ordinal aliases list
rbran Oct 7, 2024
5dea7e1
add unknown comment format
rbran Oct 7, 2024
ff06305
fix missing parent on dirtree entry version 0
rbran Oct 7, 2024
5957e9e
fix ordinal alias search on til solving
rbran Oct 7, 2024
bfb59f5
reimplement readers into a centralized impl and remove read strings
rbran Oct 8, 2024
b87f34e
don't auto solve dirtree values, allowing non-existing entries
rbran Oct 8, 2024
bcb0acd
bump version to 0.1.4
rbran Oct 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
418 changes: 418 additions & 0 deletions Cargo.lock

Large diffs are not rendered by default.

9 changes: 7 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
[package]
name = "idb-rs"
version = "0.1.0"
version = "0.1.4"
authors = ["Rubens Brandao <[email protected]>"]
edition = "2021"
license = "Apache-2.0"
license-file = "LICENSE"

[dependencies]
anyhow = "1.0.86"
anyhow = { version = "1.0.86", features = ["backtrace"] }
clap = { version = "4.5.*", features = ["derive"] }
bincode = "1.3.3"
flate2 = "1.0.31"
serde = { version = "1.0", features = ["derive"] }
serde_repr = "0.1.19"

[[bin]]
name = "idb-tools"
path = "src/tools/tools.rs"
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ Special thanks to [Willi Ballenthin] and [willem] for IDB format research:

TODO

## Documentation

IDB file format documentation: [fileformat.md](doc/fileformat.md).

## License

This plugin is released under the Apache-2.0 license.
Expand All @@ -20,4 +24,4 @@ Dependency licenses can be found [here](https://nightly.link/Vector35/idb-rs/wor

[Willi Ballenthin]:https://github.com/williballenthin
[willem]:https://github.com/nlitsme
[cargo-about]:https://github.com/EmbarkStudios/cargo-about/
[cargo-about]:https://github.com/EmbarkStudios/cargo-about/
90 changes: 90 additions & 0 deletions doc/fileformat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# IDB format

The idb format consist mainly of a header with the offsets of its sections.

Known extensions are `*.idb` for 32bits version and `*.i64` for the 64bits version.

NOTE: The `section` word is this doc refer to a section of data of the IDB file, and a `binary-section` is the section of data or the original binary (elf, dll, exe) file.

## File overview

```txt
IDB File +-----------------------------------------------------------+
Start of the file |[ File Header with offsets for all the sections ][ align ]|
Offset for Section A |[ Section a Header | Sections A bytes......................|
|...........................................................|
End for Section A |..........................................................]|
Offset for Section B |[ Section a Header | Sections B bytes......................|
|...........................................................|
End for Section B |..........................................................]|
+-----------------------------------------------------------+
```


## Sections

The IDB file contains the following sections:

* ID0: Database with most of the metadata.
* ID1: Binary data and information about each byte.
* ID2: Unknown data.
* NAM: Unknown data.
* TIL: Database of types from known library.
* SEG: Unknown data.

Each section include a header with the size of it, so it's possible to ensure that sections don't overlap and once parsing the sections
all the data is parsed or if it contains left-unparsed data.


### ID0

The main database of the project, it contains a list of key and values.

It's stored in a btree format, but if you want care about the parsed ID0, it's just a Vector with each entry being `{key: Vec<u8>, value: Vec<u8>}`,
the vectors is sorted by key.

It's stored in to btree structure, the sections is divided into pages (usually 0x2000 bytes).
Each page start will contain 0 or more btree entries, each one being a node (points to other pages) or leaf (points to just data).

Each page have entries at the start and the offset of it's key/value also is relative to it's page, usually stored at the end of the page.

It's possible that some data of this section is not parsed, mostly because deleted data is not removed from the file, it's just left unlinked to btree.

Although the id0 data format is simple and very well understand, the data stored inside id0 can be very complex or unknown.


### ID1

The bytes and bytes individual information loaded from the original binary file.

It's store sequentially with a page size of (0x2000, aligned or not depending on the version) and the parsed output is just a list of binary-section.
Each binary-sections start at a specific offset, have all the raw bytes of the binary-section, it also include 24bits of unknown information for each byte.

It's possible that some data of this section is not parsed, because it's was seing in some examples of extra data stored after all the binary-sections are parsed.
Although this is possibly some vestigial data from the original binary.


### ID2

The contents of this data each format is not known at the time.


### NAM

The Nam sections is known to contain a list bytes, what this data means is unknown.

It's unlikely that data is left unparsed, mostly because the entire section is parsed, and any in-between data is enforced to be only zeroes.


### TIL

The section contains types/macros informations from external libs, like win32, gcc, libc, etc.

This section is most likely always fully parsed, because any extra data will result into error.

NOTE: All IDA versions include a `til` directory in it's instalation folder with multiple til files, those can be used for testing.


### SEG

The contents of this data each format is not known at the time.
Empty file.
Empty file added resources/tils/til_files_here
Empty file.
Loading