Burnchi

Burnchi

欢迎来到我的安全屋~
github
bilibili

Information Collection and Summary

Why Collect Information?#

The purpose of intelligence gathering is to obtain accurate information about the penetration target to understand how the target organization operates and determine the best attack route, all of which should be done quietly without letting the other party detect your presence or analyze your intentions. One of the most important stages of penetration testing is information gathering. To initiate penetration testing, users need to collect basic information about the target host. ==The more information the user obtains, the higher the probability of successful penetration testing==.

Classification of Information Gathering#

  • Passive Information Gathering: ==Accessing the target using third-party services==: Google search, Shodan search, and other comprehensive tools. Passive information gathering refers to collecting as much information related to the target as possible.
  • Active Information Gathering: ==Directly scanning the target host or website==. The active method can obtain more information, and the target system may log operational information.

What Information Should Be Collected?#

IP ResourcesServer InformationWebsite InformationHuman Resources
Real IPOperating system type and versionCMSDomain owner, registrar
Side site informationOpen portsWAFPhone number
C-class hostsxWeb middlewareEmail
xxDevelopment languageVarious privacy
xxDatabasex
xxAPI, specific filesx

Information Gathering Methods#

1. Real IP#

01. Determine if it is a Real IP#

When talking about real IPs, let's briefly introduce what CDN technology is. Its Chinese name is ==Content Delivery Network==. To ensure network stability and fast transmission, website service providers set up node servers at different locations on the network and use CDN technology to distribute network requests to the optimal node servers.

  • Online Website Query

Website tools: http://ping.chinaz.com/
Aizhan: https://ping.aizhan.com/

==If there are multiple different response IPs, it indicates that there may be a CDN==.

  • nslookup

If the domain resolves to multiple IP addresses, it is likely using a CDN.

02. How to Find the Real IP (Bypassing CDN)#

1. Look for Subdomain IPs#

Subdomains may be on the same server or in the same C-class network as the main site. By querying the IP information of subdomains, you can assist in determining the real IP information of the main site.

See below ==4. Subdomain Information Collection==.

2. Check Historical DNS Resolution Information#

Check the historical records of IP and domain bindings. There may be records ==before using CDN==, and then ==analyze which IPs are not in the current CDN resolution IPs==, which ==may be the real IP without CDN acceleration==.

  • viewdns.info DNS historical record website, which records changes over the years.
image-20230409213133516
Syntax: domain:baihe.com type:A

Just enter the website domain in the search field and press Enter. The "Historical Data" can then be found in the left menu.

  • Cloudflare's Advice

==A, AAAA, CNAME, or MX records pointing to your origin will expose your original IP.==

So you can check the DNS resolution records corresponding to the domain.

3. Use Foreign Hosts for Direct Detection#

Another method, if you don't have foreign hosts, is to use public multi-location ping services. Multi-location ping services have foreign detection nodes, and you can use ==the ICMP response information returned from foreign nodes== to determine the real IP information.

  • Foreign node ping addresses

https://ping.chinaz.com/

http://www.webkaka.com/Ping.aspx

4. Check the Email Server IP from Emails Received#
  • RSS email subscriptions. Many websites come with sendmail, which will send emails to us. At this time, checking the email source code will include the server's real IP.
  • If the target system has a mailing function, it usually sends emails during user registration/password recovery, etc. By checking the original email sent by the system, you can view the sender's IP address.

image-20230414162712474

image-20230414162604204

  • DNS's MX records (see point 2 above).
5. Certificate Query#

https://crt.sh/

The principle is to send a client hello to the IP's 443 port. The server replies with a server hello that contains the SSL certificate, and the ==common name in the SSL certificate contains domain information==. This way, you can know the domain that resolves to this IP. So more accurately, the IP's 443 port may expose the domain.

https://search.censys.io/# Check historical certificates.

Syntax:

parsed.names: 4399.com and tags.raw: trusted
Only show valid certificate query parameters: tags.raw: trusted
image-20230410175652479

Censys will show you all standard certificates that meet the above search criteria. The above certificates were found during scanning.

Just click on any certificate.

image-20230410180019089

image-20230410180034747
6. Use zmap to Capture Target IP Segment 80 Banner Information#

Randomly scan 10,000 IPs on port 80.

zmap -B 10M -p 80 -n 10000 -o results.csv

Loop through the obtained IPs and use curl to print out the banner.

for i in `zmap -B 10M -p 80 -n 10000`; do curl -s -I "$i" >> out1; done

Then match the target domain's ==same banner==; that IP is the real IP.

7. Domain Tweaking#

In the past, when using CDN, there was a habit of only allowing the WWW domain to use CDN, while the naked domain did not use it, to make it more convenient to maintain the website without waiting for CDN caching. So try removing the www from the target website and ping to see if the IP changes.

8. Social Engineering#

If you have obtained the target website administrator's account in CDN, you can find the website's real IP in the CDN configuration.

2. Side Site Information Collection#

Side sites are different websites on the ==same server as the attack target==. When the attack target has no vulnerabilities, you can find vulnerabilities in the side sites, attack the side sites, and then escalate privileges to gain the highest permissions on the server.

  • nmap port scanning
nmap -sV -p- real_ip -v -oN xxx.txt
  • Online query websites

https://www.webscan.cc/

https://stool.chinaz.com/same

3. C-Class Information Collection#

C-class hosts refer to servers that are ==in the same C-class network as the target server==. The live hosts in the target's C-class are important information for information gathering. Many internal servers of units and enterprises may be in the same C-class network.

  • nmap
nmap -sn real_IP/24 -v -oN xxx.txt

-n (do not use domain name resolution)
Tells Nmap to never perform reverse DNS resolution on the active IP addresses it discovers. Since DNS is generally slow, this can speed things up.

  • Use Google, syntax: site:125.125.125.*

4. Subdomain Information Collection#

01. Subdomain Bruteforce Tools#

A Python tool, OneForAll requires a version higher than Python 3.6.0 to run. OneForAll will generate corresponding results in the results directory upon normal execution with default parameters. Install dependencies before use: pip install -r requirements.txt.

python3 oneforall.py --target example.com run
python3 oneforall.py --targets ./example.txt run
  • JSFinder See below ==8.06==.
  • ESD (Download from GitHub, but I encountered errors using it).
# Scan a single domain
esd -d qq.com
  • subfinder (Download from GitHub, requires Go language).
subfinder -d hackerone.com

Used with httpx, it can find running HTTP servers (httpx is written in Go).

echo 4399.com | subfinder -silent | httpx -ip > subdomain_list
httpx --silent only outputs the domain.
image-20230410220150411

02. Online Query Websites#

  • ==Search Engines to Discover Subdomains==
Baidu Search Engine
site:baidu.com

Google Search Engine
site:baidu.com

https://fofa.info/
https://www.shodan.io/
https://x.threatbook.com/v5/mapping

https://dnsdumpster.com/

https://www.dnsdb.io/zh-cn/ Useful but requires membership for extensive use

Input baidu.com type:A.

image-20230410171107658

5. Determine Operating System Type and Version#

  • nmap
nmap -O 192.168.88.21
  • Check if the website URL is case-sensitive (not case-sensitive is Windows, otherwise Linux).

  • Windows TTL value is generally 128 (or >100), while Linux is 64.

6. Website Owner Information Collection#

Helpful for dictionary creation.

01. whois#

Whois (pronounced "Who is", not an abbreviation) is a protocol used to query domain IP and owner information. In simple terms, whois is a database used to check whether a domain has been registered and to provide detailed information about the registered domain (such as ==domain owner, domain registrar==). Whois is used to query domain information. Early whois queries were mostly command-line interfaces, but now some web interfaces have emerged to simplify online query tools, allowing queries to multiple databases at once. Web interface query tools still rely on the whois protocol to send query requests to servers, while command-line interface tools are still widely used by system administrators. Whois typically uses the TCP protocol on port 43. Each domain/IP's whois information is maintained by the corresponding management organization.

==The WHOIS information for each domain or IP is maintained by the corresponding management organization==. For example, the WHOIS information for .com domains is managed by the .com domain operator VeriSign, while the national top-level domain .cn in China is managed by CNNIC.

image-20230410144905617

image-20230410145157031

02. Social Engineering#

Assuming we have obtained information through the target's colleagues, such as the target's real name, contact information, work hours, etc. *A skilled social engineer will organize, classify, and filter the information to construct a carefully prepared trap, allowing the target to walk into it.*

03. Personal Information Retained by Official Websites#

Generally, companies will place official contact information on their official websites, which can be used to collect email and phone information.

04. Recruitment Information Collection#

Recruitment information on job websites contains a lot of personnel-related information. Recruitment information involves electronic mail, phone numbers, and other related information of the recruited personnel, while job seekers' resumes contain very detailed personal information such as names, phone numbers, emails, and work experience. If there are security vulnerabilities on the recruitment website, job seekers' resumes may be leaked.

05. ICP Filing Information#

Know company information, filing review time.

https://icp.chinaz.com/

https://beian.miit.gov.cn/#/Integrated/index

06. Exposed Locations#

The same effect as point 2 above.

  • View individual certificate information.

https://crt.sh/

https://search.censys.io/#

07. Check Company Information#

https://www.qcc.com/

https://www.tianyancha.com/

https://tool.chinaz.com/

08. Obtain Email Information#

https://www.skymem.info/

09. Others#

(1) Look for usernames directly on the web (as they generally have emails, you can get usernames based on company names or numbers to generate corresponding dictionaries).
(2) Use Google syntax to search for xlsx, etc., or directly search for this company-related information, which may reveal usernames.
(3) Check GitHub for this company to see if there are any leaks.
(4) Look for interviewers on job websites, as they may leak phone numbers and usernames, and check usernames based on phone numbers.
(5) Search for the company's organizational chart and note down any leaders.
(6) Use public accounts, Weibo, and other social media to search for company information.
(7) Use Baidu Images (this depends on luck; sometimes web searches yield too many results, so directly looking at Baidu Images may reveal usernames quickly; I thought of this when I needed to find a number during a previous attack-defense exercise, but the number was too blurred to see clearly).
(8) Look for commonly used username dictionaries for collection.

7. Identify CMS#

A Content Management System (CMS) is a system for managing website content. CMSs have many ==template-based excellent designs==, ==which can speed up website development and reduce development costs==. The functionality of a CMS is not limited to text processing; it can also handle images, Flash animations, audio and video streams, graphics, and even email archives. CMS is actually a broad term that can refer to anything from general blog programs and news publishing programs to comprehensive website management programs.

01. Manual Identification#

  • ==The footer may expose the CMS==
powered by ...
  • ==robots.txt file==

Determine this CMS through a specific path.

  • ==Response Header Information==

cookie section.

  • ==Website Backend==

The website's backend login interface also has characteristic codes of the CMS.

  • Determine based on URL routing, such as wp-admin.

02. Fingerprint Recognition Tools#

The main development idea: establish a connection with request --- obtain webpage content --- use regular expressions to match keywords --- identify CMS type.

  • ==Chrome Extension -- Wappalyzer==
image-20230410151057570
  • Common tools include CMSeek.

image-20230410153616628

03. Online CMS Recognition Websites#

http://whatweb.bugscaner.com/look/

8. Identify Web Middleware#

  • Response headers.
  • Determine based on error messages.
  • Determine based on default pages.

9. Internet Asset Collection#

Includes historical vulnerability information, GitHub source code leaks, SVN source code information, leaked cloud disk file information, etc.

01. Historical Vulnerability Information#

Google search for relevant software vulnerabilities.

02. GitHub Source Code Information Leaks#

GitHub is a hosting platform for open-source and private software projects, and many people like to upload their code to the platform. ==Attackers can search using keywords== to find ==sensitive information about the target site==, and even download the website source code.

When developers use git for version control, after initializing a repository in a directory, a hidden folder named .git is created in that directory, which contains all versions and a series of information about the repository. ==If the server places the .git folder in the web directory==, it may allow attackers to obtain all source code of the application using the information inside the .git folder.

  • GitHub syntax search
in:namevue in Matches repositories containing "jquery" in their names.
in:descriptionvue in,description Matches repositories containing "vue" in their names or descriptions.
in:readmevue in Matches repositories mentioning "vue" in their readme files.
repo:owner/namerepo/blog Matches specific repository names, such as the blog project of user biaochenxuying.

For more details on search syntax, see

https://github.com/FrontEndGitHub/FrontEndGitHub/issues/4

  • GitHack, to pull source code.

A `.git` folder disclosure exploit

03. Backup Site Compressed Packages#

Attempt to obtain through directory scanning.

04. SVN#

You can use the .svn/entries file to obtain server source code, SVN server account passwords, and other information. A more serious issue is that the .svn directory generated by SVN also contains source code file copies ending with .svn-base (for lower versions of SVN, the specific path is the text-base directory, while for higher versions, it is the pristine directory). If the server does not parse such suffixes, hackers can directly obtain the source code files.

Details

https://cloud.tencent.com/developer/article/1376492

  • Source code restoration tool

05. DNS Information Leaks#

A. MX Record Leaks

https://dnsdumpster.com/

https://www.robtex.com/

https://mxtoolbox.com/

06. API Leaks#

JSFinder Tampermonkey Script

image-20230410193338698

07. Other Sensitive Files#

First check which CMS is being used, and then scan according to that CMS's directory structure.

If no CMS is used, use conventional sensitive file name dictionaries for scanning, such as:

  • robots.txt
  • crossdomain.xml
  • sitemap.xml
  • xx.tar.gz
  • xx.bak
  • phpinfo

Lingfengyun Search

https://www.lingfengyun.com/

Xiaobaipan Search

Dali Pan Search

Xiaobudian Search (Weipan)

Baidu Cloud Disk Crawling Open Source Tool

Google search for relevant middleware information leaks.

10. WAF Identification#

WAF Functions

image-20230414180526459

Find WAF by looking at the image.

https://blog.csdn.net/weixin_46676743/article/details/112245605

Tools

  • WAFW00f

Or manually input incorrect URIs and SQL statements, and XSS to see if you can trigger WAF alerts.

  • nmap -p 80 --script http-waf-detect.nse 4399.com

11. Port Scanning#

Method for scanning all ports.

nmap is slow.

nmap -sV -Pn -p- 1.1.1.1 -oX result.xml

masscan is fast but sometimes inaccurate.

masscan --open --banners -p- 1.1.1.1 --rate 1000 -oX result.xml

Common Port Vulnerability Information Table

Port NumberServiceAttack Methods
21/22/69ftp/tftpBrute force, sniffing, overflow, backdoor
22sshBrute force, 28 backspaces
23telnetBrute force, sniffing
25smtpEmail forgery, brute force
53dnsDNS zone transfer, DNS hijacking, DNS cache poisoning, DNS spoofing, DNS tunneling
67/68dhcpHijacking, spoofing
110pop3Brute force
139sambaBrute force, unauthorized, remote code execution
143imapBrute force
161snmpBrute force
389ldapInjection, unauthorized
512/513/514linux rDirectly use rlogin
873rsyncUnauthorized
1080socketBrute force, internal penetration
1352lotusBrute force, weak passwords, information leakage (source code)
1433mssqlBrute force, injection
1521oracleBrute force, injection, TNS remote poisoning
2049nfsMisconfiguration
2181zookeeperUnauthorized
3306mysqlBrute force, injection, denial of service
3389rdpBrute force, shift backdoor
4848glassfishBrute force, console weak passwords, authentication bypass
5000sybase/db2Brute force, injection
5432postgreSQLBrute force, weak passwords, injection, buffer overflow
5632pcanywhereDenial of service, code execution
6379redisUnauthorized, brute force, weak passwords
7001weblogicDeserialization, console weak passwords, console deployment webshell
8069zabbixRemote command execution
8080-8090webCommon web attacks, brute force, middleware vulnerabilities, CMS version vulnerabilities
9090websphereBrute force, console weak passwords, deserialization
9200/9300elasticsearchRemote code execution
11211memcachedUnauthorized
27017mongoDBBrute force, unauthorized

12. Directory Scanning#

13. APP Information Collection#

14. Other Information Collection Channels#

  • Zhihu
  • Tieba
  • Social Engineering Database
  • Telegram

Send any email to http://tool.chacuo.net/mailanonymous and https://emkei.cz/
Temporary email http://www.yopmail.com/
Email pool group http://veryvp.com/

Methods to Prevent Information Gathering#

If website administrators want to prevent their websites from being subjected to preliminary information gathering by hackers, they can modify the webpage's characteristic information:
(1) Modify webpage display information (webpage templates, technical support, keywords, version information, backend login module information, etc.)
(2) Modify webpage path information (/robots, /admin, etc.)
(3) Personalize webpage information; modify webpage path information can be done by using pinyin abbreviations or personalized methods to hide the generic path names of the website building vendors, for example, changing /admin to /a8min.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.