Replies: 10 comments 15 replies
-
Hi @akh7177 , you're in the right spot to get started. Please review our contributor guidance and reach out here if you have any questions. |
Beta Was this translation helpful? Give feedback.
-
Hi @mike-hunhoff, As I understand it, the current Ghidra backend of CAPA was developed using Ghidrathon because Python 3 support was unavailable at the time (2023). However, with recent Ghidra builds, direct access to Ghidra APIs is now possible through the PyGhidra library. Since PyGhidra is distributed along with Ghidra, migrating the backend to PyGhidra seems like a more sustainable long-term approach. Does my understanding align with the project's goals? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hello @williballenthin , I was re-reading the contributor-guidance and noticed that the last date for application deadline is April 2nd. Is it on April 2nd this year too or is it just a small typo? |
Beta Was this translation helpful? Give feedback.
-
Hello @mike-hunhoff, I’m thinking of writing unit tests to check all functions in each Ghidra feature extractor script and submitting a unit test report along with my updated code as weekly deliverables. This way, I can ensure that the updated scripts work as expected before moving on to integration tests. Does this approach sound good to you? I’d appreciate your feedback. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @mike-hunhoff, My name is Daniel Stanculescu, and I am a first-year master's student in Cybersecurity. I have a strong interest in cybersecurity, and I am currently working as a Junior Researcher. I would love to contribute to this project to gain more knowledge in reverse engineering and further support the organization. I have experience with Python, as well as Git/GitHub. I am currently studying reverse engineering because I plan to base my master's thesis on malware detection using machine learning. For now, I will start by contributing with some small changes to the code. Please let me know if there are any other steps I need to take in order to have a chance to contribute to the Google Summer of Code 2025 program. Also do I need to start another discussion or can I just write in here? |
Beta Was this translation helpful? Give feedback.
-
Hello @mike-hunhoff 👋 I spent this Sunday working on my GSoC application! Would you be able to review my draft and share your feedback? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @mike-hunhoff, |
Beta Was this translation helpful? Give feedback.
-
Hi @mike-hunhoff , |
Beta Was this translation helpful? Give feedback.
-
Hello @mike-hunhoff, I tried migrating just the extract_os function in global_.py to PyGhidra and it works amazing, except for the fact that there is a certain delay when PyGhidra Initiates a connection.
Please let me know your opinion regarding this!! ✨ Below are the scripts that I modified. This is the test file that I used #test.py
import global_
import pyghidra
pyghidra.start()
if pyghidra.started(): print("Pyghidra Started")
with pyghidra.open_program("../../../../tests/data/2bf18d0403677378adad9001b1243211.elf_") as flat_api:
for os_feature, address in global_.extract_os(flat_api):
print(f"Detected OS: {os_feature.value}") This is the modified global_.py. I have included modified part of the code. As a temporary solution, I created a GHIDRAIO class in this file itself. #global_.py
class GHIDRAIO:
def __init__(self,flat_api):
super().__init__()
self.offset = 0
self.flat_api = flat_api
self.bytes_ = self.get_bytes()
def seek(self, offset, whence=0):
assert whence == 0
self.offset = offset
def read(self, size):
# Ensure you read only within the extracted bytes
if self.offset + size > len(self.bytes_):
logger.debug("Cannot read 0x%x bytes at 0x%x (out of bounds)", size, self.offset)
return b""
result = self.bytes_[self.offset:self.offset + size]
self.offset += size
return result
def close(self):
pass
def get_bytes(self):
file_bytes = self.flat_api.getCurrentProgram()
memory=file_bytes.getMemory()
get_bytes=memory.getAllFileBytes()[0]
# getOriginalByte() allows for raw file parsing on the Ghidra side
# other functions will fail as Ghidra will think that it's reading uninitialized memory
bytes_ = [get_bytes.getOriginalByte(i) for i in range(get_bytes.getSize())]
return capa.features.extractors.ghidra.helpers.ints_to_bytes(bytes_)
logger = logging.getLogger(__name__)
def extract_os(flat_api) -> Iterator[tuple[Feature, Address]]:
program=flat_api.getCurrentProgram()
format_name: str = program.getExecutableFormat()
if "PE" in format_name:
yield OS(OS_WINDOWS), NO_ADDRESS
elif "ELF" in format_name:
with contextlib.closing(GHIDRAIO(flat_api)) as f:
os = capa.features.extractors.elf.detect_elf_os(f)
yield OS(os), NO_ADDRESS
else:
logger.debug("unsupported file format: %s, will not guess OS", format_name)
return |
Beta Was this translation helpful? Give feedback.
-
Hi @mike-hunhoff, Based on my understanding, by running this script in the Ghidra Script Manager, I am essentially providing it with the same environment that Ghidrathon uses, giving it PyGhidra’s context with native Ghidra API access. This is something we aim to maintain throughout the implementation, rather than executing capa-ghidra from outside the Ghidra environment. Am I correct? |
Beta Was this translation helpful? Give feedback.
-
Hi @mike-hunhoff ,
I’m Abhyuday Hegde, and I came across the “Migrate to PyGhidra” project @mandiant/flare-gsoc on GitHub, listed for GSoC 2025. I’m interested in contributing to the migration of the capa Ghidra backend to PyGhidra and would like to continue supporting the project beyond the GSoC period.
I have experience with Python and Git/GitHub, but I’m still familiarizing myself with Ghidra and Capa. I’m eager to help with tasks like porting functionality, testing, and updating documentation.
Please let me know if there are any next steps or if you’d like to discuss how I can contribute to the project.
Beta Was this translation helpful? Give feedback.
All reactions