-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Qiang Han
committed
Feb 3, 2012
0 parents
commit 2ef0395
Showing
23 changed files
with
962 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
My suggestion is to read: | ||
/RSA_src/src_code/ReadMe.txt | ||
/SubstututionCipher_src/src_code/ReadMe.txt | ||
to see what is this all about. Enjoy! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
111648099719936040704258821388118238277748025510501456817634394522881645789192423977764620842223806117626082463531207631745109853090350440222075792743 126726202508125265119947457871803337768944747820054482628554989006698834377844301173821151768505865533968137619846588028267787398981400439239772358437 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
44185254783815481731160894001616397991204124951262480535862790947216769700465232634923636713731668150207240962597345690119732824254213983282701761207 126726202508125265119947457871803337768944747820054482628554989006698834377844301173821151768505865533968137619846588028267787398981400439239772358437 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Given large data sets, whether they are text or numeric, it is often useful to group together, or cluster, similar items automatically. For instance, given all of the news for the day from all of the newspapers in the United States, you might want to group all of the articles about the same story together automatically; you can then choose to focus on specific clusters and stories without needing to wade through a lot of unrelated ones. Another example: Given the output from sensors on a machine over time, you could cluster the outputs to determine normal versus problematic operation, because normal operations would all cluster together and abnormal operations would be in outlying clusters.Like CF, clustering calculates the similarity between items in the collection, but its only job is to group together similar items. In many implementations of clustering, items in the collection are represented as vectors in an n-dimensional space. Given the vectors, one can calculate the distance between two items using measures such as the Manhattan Distance, Euclidean distance, or cosine similarity. Then, the actual clusters can be calculated by grouping together the items that are close in distance.There are many approaches to calculating the clusters, each with its own trade-offs. Some approaches work from the bottom up, building up larger clusters from smaller ones, whereas others break a single large cluster into smaller and smaller clusters. Both have criteria for exiting the process at some point before they break down into a trivial cluster representation (all items in one cluster or all items in their own cluster). Popular approaches include k-Means and hierarchical clustering. As I'll show later, Mahout comes with several different clustering approaches.The goal of categorization (often also called classification) is to label unseen documents, thus grouping them together. Many classification approaches in machine learning calculate a variety of statistics that associate the features of a document with the specified label, thus creating a model that can be used later to classify unseen documents. For example, a simple approach to classification might keep track of the words associated with a label, as well as the number of times those words are seen for a given label. Then, when a new document is classified, the words in the document are looked up in the model, probabilities are calculated, and the best result is output, usually along with a score indicating the confidence the result is correct.Features for classification might include words, weights for those words (based on frequency, for instance), parts of speech, and so on. Of course, features really can be anything that helps associate a document with a label and can be incorporated into the algorithm.The field of machine learning is large and robust. Instead of focusing further on the theoretical, which is impossible to do proper justice to here, I'll move on and dive into Mahout and its usage.Apache Mahout is a new open source project by the Apache Software Foundation with the primary goal of creating scalable machine-learning algorithms that are free to use under the Apache license. The project is entering its second year, with one public release under its belt. Mahout contains implementations for clustering, categorization, CF, and evolutionary programming. Furthermore, where prudent, it uses the Apache Hadoop library to enable Mahout to scale effectively in the cloud.These components and their implementations make it possible to build out complex recommendation systems for either real-time-based recommendations or offline recommendations. Real-time-based recommendations often can handle only a few thousand users, whereas offline recommendations can scale much higher. Taste even comes with tools for leveraging Hadoop to calculate recommendations offline. In many cases, this is a reasonable approach that allows you to meet the demands of a large system with a lot of users, items, and preferences.To demonstrate building a simple recommendation system, I need some users, items, and ratings. For this purpose, I randomly generated a large set of Users and Preferences for the Wikipedia documents (Items in Taste-speak) using the code in cf.wikipedia.GenerateRatings (included in the source with the sample code) and then supplemented this with a set of hand-crafted ratings around a specific topic (Abraham Lincoln) to create the final recommendations.txt file included in the sample. The idea behind this approach is to show how CF can guide fans of a specific topic to other documents of interest within the topic. In the example data are 990 (labeled 0 to 989) random users who have randomly assigned ratings to all the articles in the collection, and 10 users (labeled 990 to 999) who have rated one or more of the 17 articles in the collection containing the phrase Abraham Lincoln.To start, I'll demonstrate how to create recommendations for a user given the set of ratings in recommendations.txt. As is the case with most uses of Taste, the first step is to load the data containing the recommendations and store it in a DataModel. Taste comes with several different implementations of DataModel for working with files and databases. For this example, I'll keep things simple and use the FileDataModel class, which expects each line to be of the form: user ID, item ID, preference where both the user ID and the item ID are strings, while the preference can be a double. Given a model, I then need to tell Taste how it should compare users by declaring a UserSimilarity implementation. Depending on the UserSimilarity implementation used, you might also need to tell Taste how to infer preferences in the absence of an explicit setting for the user. Listing 1 puts all of these words into code.First and foremost, clustering algorithms require data that is in a format suitable for processing. In machine learning, the data is often represented as a vector, sometimes called a feature vector. In clustering, a vector is an array of weights that represent the data. I'll demonstrate clustering using vectors produced from Wikipedia documents, but the vectors can come from other areas, such as sensor data or user profiles. Mahout comes with two Vector representations: DenseVector and SparseVector. Depending on your data, you will need to choose an appropriate implementation in order to gain good performance. Generally speaking, text based problems are sparse, making SparseVector the correct choice for them. On the other hand, if most values for most vectors are non-zero, then a DenseVector is more appropriate. If you are unsure, try both and see which one works faster on a subset of your data. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
South Korea and the United States began assembling ships for joint war exercises Sunday off the west coast of the Korean peninsula in the Yellow Sea, a source at the South Korean Joint Chiefs told CNN.The exercises are set to begin as diplomats worked to ease tensions in the Koreas after North Korea warned of unpredictable "consequences" if the United States fulfills its vow of deploying an aircraft carrier to the Yellow Sea for joint military maneuvers with South Korea.China's foreign minister spoke with his Russian, U.S., and Japanese counterparts, and a Chinese representative visited Seoul as envoys underscored the need to lower the temperature in the longtime flash-point region, days after four South Koreans died when North Korea shelled Yeonpyeong Island.North Korea said the South provoked the Tuesday attack because shells from a South Korean military drill landed in the North's waters. South Korea was holding its annual Hoguk military drill when the North started its shelling, and the South returned fire.North Korea's official Korean Central News Agency on Saturday slammed South Korea and the United States for provoking the crisis.It called reports of civilian casualties part of South Korea's "propaganda campaign" and accused the "enemy" of creating "a human shield by deploying civilians around artillery positions and inside military facilities before the launch of the provocation.""If the U.S. brings its carrier to the West Sea of Korea at last, no one can predict the ensuing consequences," said KCNA, referring to the aircraft carrier USS George Washington, which is set to join South Korea's forces near the coasts of China and North Korea for the four-day military drill scheduled to start Sunday.U.S. State Department spokeswoman Nicole Thompson called the claims "outrageous.""This is just another example of North Korea's own internal propaganda. The North Koreans for many years, including the Cheonan warship incident, have taken provocative action. This didn't have anything to do with U.S. actions," Thompson told CNN, referring to the sinking of a South Korean ship in March that left 46 people on board dead.The United States and South Korea blame the sinking on the North, which has consistently denied responsibility.Diplomats, seeking a lessening of tensions and a return to the six-party talks with North Korea over the country's nuclear aspirations, busily labored to avert more hostilities. The United States, China, Japan, Russia, South Korea and North Korea are the six countries that have been involved in the talks, which were put on hold in 2008."These parties should call on the DPRK and South Korea to exercise calmness and restraint and hold dialogue and make contacts, and not to take actions that would escalate the conflict," China's official Xinhua news agency quoted Chinese Foreign Minister Yang Jiechi as saying. China is North Korea's largest trading partner.Yang and Russian Foreign Minister Sergey Lavrov "stressed the need to prevent the situation from exacerbating and to work toward relieving the tensions," according to the Russian Foreign Ministry.Xinhua reported that Japanese Foreign Minister Seiji Maehara said his country "is willing to work together with China to joint safeguard peace and stability on the Korean peninsula."And a Twitter message from U.S. State Department spokesman P.J. Crowley said Secretary of State Hillary Clinton spoke with Yang on Friday and "encouraged Beijing to make clear that North Korea's behavior is unacceptable."Meanwhile, Dai Bingguo, a Chinese state councilor, sat down with South Korean Foreign Minister Kim Sung-hwan in Seoul to discuss the tensions.The violence has sparked anger and political turmoil in South Korea. The country's defense minister, Kim Tae-young resigned after the exchange of fire, and veterans of the South Korean military protested Saturday on the streets of Seoul, stating they were angry that their country's government had not done enough to respond to the North's shelling.One group of protesters gathered near the defense ministry building Saturday, clashing with police officers with some charging and kicking officers.There will be no live firing element in the drills; live firing exercises can only take place in a designated training range or in a closed-off area at sea, Cmdr. Jeff Davis, public affairs officer for the U.S. 7th Fleet, and such firing exercises are not possible given the amount of traffic in the area. | ||
The drills will include anti-air attack and anti-surface-attack exercises, communications and data drills, expert exchanges, logistical support, and replenishment drills. For example, a Korean oil tanker will refuel a U.S. ship, Davis said. | ||
But the prospect of more violence has prompted alarm across the region. Japan's Kyodo news agency reported that Japanese "Cabinet members have been ordered to stay in Tokyo until Wednesday and be at their ministry offices within an hour in the event emergency situations develop." | ||
South Korea said Thursday that it will strengthen its rules of engagement in the Yellow Sea. South Korean marine forces based in five islands near North Korea and the disputed Northern Limit Line also will be reinforced, a government spokesman said.The Yeonpyeong attack was the first direct artillery assault on South Korea since 1953, when an armistice ended fighting, though both Koreas are still technically at war.Most people are curious at what type of war it will be??? Considering both sides have extremely strong Air Defense assets - air attack will be secondary. To go across the DMZ they would have to cross the largest minefield in the world, so maneuver will probably be secondary (no one is going to use those tunnels because it would be suicide). What you will see is the largest Field Artillery Battle ever in history. Cannons (ground and sea), rockets, and missiles (including patriots). The surfaces will be demolished, but both countries have large underground infrastructures for this type of battle. China will not enter the war because they have about 150 billion in trade with South Korea, but they also do not want a U.S. Friendly government at their door stepPeople who buy the iPad online Friday will save $41, making the lowest price for the device $458 for a model without a 3G connection and with 16 gigabytes of storage space. The top-end model, with a 3G connection and 64 gigabytes of storage, is also discounted $41, making it $788.Those one-day sale prices probably still sound expensive to many consumers, but the Black Friday sale is getting buzz from tech blogs because Apple rarely discounts the price of its high-end electronics and because the iPad is said to be one of the most sought-after gifts of the year.The site is offering free shipping on orders of $50 or more. The online price cut highlights a trend toward online sales and discounts this holiday season, but the Apple discounts were also available in Apple's retail stores Friday, according to two Apple store employees phoned by CNN. | ||
At least one other retailer appeared to have discounted the iPad, too. At least one T.J. Maxx store was selling the base-level iPad for $400, or $58 less than the price listed at Apple.com, according to the tech blog Engadget, which posted apparent photos showing the in-store-only sale. The same photo was posted by a user on the store's website. | ||
An employee who answered the phone Friday morning at that T.J. Maxx store, in Mount Vernon, New York, told CNN the $400 iPads were sold out. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Where there is a will there is a way. |
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Oops, something went wrong.