Scripts | Codes

Showing posts with label links. Show all posts

[BASH] Extract links from a file and download | Extraire des liens d'un fichier puis les télécharger Edit

bash, cat, commande, extensions, Extract, extraire, grep, liens, links, one line, perl, regex, shell, une ligne, wget 1 commentaires

Extraire des liens d'un fichier (images, pages HTML, pdf, ...) et l'affiche dans le Terminal ou les télécharger avec wget (en une seul ligne).

Vous pouvez choisir les extensions que vous voulez extraire entre les dernières parenthèses séparées par des |

Dans l'exemple on extrait les lien des images d'un fichier index.php puis on les télécharge

On utilise un RegExp de type Perl avec -P

Extract links from a file (images, HTML pages, pdf, ...) and displays them in the Terminal or downloads them with wget (in only one line of code).

You can choose which extensions you want to extract by specifying them inside the last parentheses separated by |

In the example we extract the images link to a index.php file and then downloads them

It uses a Perl-type RegExp with -P

يستخرج كل أنواع الروابط من ملف في Terminal

ثم يمكن تحميلها بإستعمال wget

يمكنك إختيار التمديدات بكتابتها بين القوسين مفصولة ب |

في هذا المثال نستخرج كل روابط الصور ثم نحملها

يستخدم ريجيكس من نوع Perl

Open in a new window

wget `cat index.php | grep -P -o 'http:(\.|-|\/|\w)*\.(gif|jpg|png|bmp)'`

[PHP] Crawler | Extracteur de liens récursives Edit

crawler, Extract, extraire, liens, links, php, sites 0 commentaires

Extrait les liens de chaque page et retrouves les liens dans ces nouvelles pages...

Il faut créer un fichiers links.dat dans le même répertoire et y mettre les liens

Extract links from each page and find the links in these news pages ...

One should create a file links.dat and put links inside

يستخرج الروابط من كل صفحة يجدها في links.dat ثم يستخرج الروابط الجديدة الموجودة في هذه الصحف

ينبغي إنشاء ملف links.dat و وضع الروابط فيه

Open in a new window

<?php
//################################################
// for more codes scripts-n-codes.blogspot.com
//################################################
//
// put the links to crawl in a links.dat file; you can put one site utl for example
//
$datafile = "links.dat"; // file to keep the list of links in
$regex = "/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU";  // regex to search for hrefs

$handle = fopen($datafile, "r"); // open the data file
$buffer = fgets($handle, 4096);
$oldlinks[] = $buffer; // read the first link into an array
while (!feof($handle)) {
 $buffer = fgets($handle, 4096);
 array_push($oldlinks,$buffer); // read the rest of the links into an array
}
fclose($handle); // close the data file

foreach($oldlinks as $value) { // for every link in the array
 print $value; // print it out
 $remote = fopen(trim($value), "r") or die(); //open it or fail nicely
 while (!feof($remote)) {
  $html = fread($remote, 8192); // read in the remote page
 }
 fclose($remote); // close it
 if (preg_match_all($regex, $html, $links)) { // if we find new links
  $local = fopen($datafile, "a+"); // open the data file
  foreach($links[1] as $value) { // for every new link
   $value.="\n"; // append a new line
   if(!in_array($value,$oldlinks)) { // if we haven't seen it before (nb - case sensitive)
    print($value); // print it out
    fwrite($local, $value); // and write it to file
   }
  }
  fclose($local); // close the data file
 }
 else {
  print("No links."); // we didn't find any links in the new file
 }
}
?>

Older Posts

Scripts | Codes

[BASH] Extract links from a file and download | Extraire des liens d'un fichier puis les télécharger Edit

[PHP] Crawler | Extracteur de liens récursives Edit

Labels