-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
switch names, fix tests, add start of experimental formatter
- Loading branch information
ironholds
committed
Aug 21, 2015
1 parent
efb6d7c
commit 706afe8
Showing
11 changed files
with
210 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Generated by roxygen2 (4.1.1): do not edit by hand | ||
|
||
export(humaniformat) | ||
export(parse_names) | ||
importFrom(Rcpp,sourceCpp) | ||
useDynLib(humaniformat) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,35 @@ | ||
# This file was generated by Rcpp::compileAttributes | ||
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 | ||
|
||
#' @title Parse Human Names | ||
#' @description human names are complex things; sometimes people have honorifics, or not. Or a single middle name, or many. Or | ||
#' a compound surname, or not a compound surname but 'PhD' at the end of their name, and augh. | ||
#' | ||
#' \code{parse_names} provides a simple | ||
#' function for taking consistently formatted human names and splitting them into \code{salutation}, \code{first_name}, | ||
#' \code{middle_name}, \code{last_name} and \code{suffix}. It is capable of dealing with compound surnames, multiple middle names, | ||
#' and similar variations, and is fully vectorised. | ||
#' | ||
#' @param names a character vector of names to parse. | ||
#' | ||
#' @return a data.frame with the columns \code{salutation}, \code{first_name}, | ||
#' \code{middle_name}, \code{last_name}, \code{suffix} and \code{full_name} (which contains the original name). In the | ||
#' event that a name doesn't \emph{have} a salutation, middle name, suffix, or so on, an empty string will be in that | ||
#' field instead. | ||
#' | ||
#' @examples | ||
#' # Parse a simple name | ||
#' parse_names("Oliver Keyes") | ||
#' | ||
#' # Parse a more complex name | ||
#' parse_names("Hon. Oliver Timothy Keyes Esq.") | ||
#' | ||
#' @export | ||
humaniformat <- function(names) { | ||
.Call('humaniformat_humaniformat', PACKAGE = 'humaniformat', names) | ||
parse_names <- function(names) { | ||
.Call('humaniformat_parse_names', PACKAGE = 'humaniformat', names) | ||
} | ||
|
||
format_names <- function(names) { | ||
.Call('humaniformat_format_names', PACKAGE = 'humaniformat', names) | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
% Generated by roxygen2 (4.1.1): do not edit by hand | ||
% Please edit documentation in R/RcppExports.R | ||
\name{parse_names} | ||
\alias{parse_names} | ||
\title{Parse Human Names} | ||
\usage{ | ||
parse_names(names) | ||
} | ||
\arguments{ | ||
\item{names}{a character vector of names to parse.} | ||
} | ||
\value{ | ||
a data.frame with the columns \code{salutation}, \code{first_name}, | ||
\code{middle_name}, \code{last_name}, \code{suffix} and \code{full_name} (which contains the original name). In the | ||
event that a name doesn't \emph{have} a salutation, middle name, suffix, or so on, an empty string will be in that | ||
field instead. | ||
} | ||
\description{ | ||
human names are complex things; sometimes people have honorifics, or not. Or a single middle name, or many. Or | ||
a compound surname, or not a compound surname but 'PhD' at the end of their name, and augh. | ||
\code{parse_names} provides a simple | ||
function for taking consistently formatted human names and splitting them into \code{salutation}, \code{first_name}, | ||
\code{middle_name}, \code{last_name} and \code{suffix}. It is capable of dealing with compound surnames, multiple middle names, | ||
and similar variations, and is fully vectorised. | ||
} | ||
\examples{ | ||
# Parse a simple name | ||
parse_names("Oliver Keyes") | ||
# Parse a more complex name | ||
parse_names("Hon. Oliver Timothy Keyes Esq.") | ||
} | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
#include "human_format.h" | ||
|
||
std::string human_format::comma_format(std::string name){ | ||
|
||
// Split on commas. If there are no commas, return. | ||
std::deque < std::string > split_string = split_parts(name, ","); | ||
if(split_string.size() < 2){ | ||
return name; | ||
} | ||
|
||
std::string output; | ||
std::string holding; | ||
|
||
// Comma formatting | ||
while(split_string.size() > 0){ | ||
unsigned int split_size = (split_string.size() - 1); | ||
if(match_component(split_string[split_size], suffixes)){ | ||
if(output.size() == 0){ | ||
output.append(split_string[split_size]); | ||
} else { | ||
output.append(" " + split_string[split_size]); | ||
} | ||
} else { | ||
if(output.size() == 0){ | ||
|
||
} | ||
holding.append(split_string[split_size]); | ||
} | ||
split_string.pop_back(); | ||
} | ||
|
||
if(holding.size() > 0){ | ||
output = holding + output; | ||
} | ||
|
||
return output; | ||
} | ||
|
||
std::vector < std::string > human_format::format_vector(std::vector < std::string > names){ | ||
|
||
unsigned int input_size = names.size(); | ||
|
||
// For each element, go nuts | ||
for(unsigned int i = 0; i < input_size; i++){ | ||
if((i % 10000) == 0){ | ||
Rcpp::checkUserInterrupt(); | ||
} | ||
|
||
names[i] = comma_format(names[i]); | ||
|
||
} | ||
|
||
return names; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#include "human_parse.h" | ||
|
||
|
||
#ifndef __HUMAN_FORMAT__ | ||
#define __HUMAN_FORMAT__ | ||
|
||
class human_format: public human_parse { | ||
|
||
private: | ||
|
||
std::string comma_format(std::string name); | ||
|
||
public: | ||
|
||
std::vector < std::string > format_vector(std::vector < std::string > names); | ||
|
||
}; | ||
|
||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,37 @@ | ||
#include "human_parse.h" | ||
#include "human_format.h" | ||
|
||
//' @title Parse Human Names | ||
//' @description human names are complex things; sometimes people have honorifics, or not. Or a single middle name, or many. Or | ||
//' a compound surname, or not a compound surname but 'PhD' at the end of their name, and augh. | ||
//' | ||
//' \code{parse_names} provides a simple | ||
//' function for taking consistently formatted human names and splitting them into \code{salutation}, \code{first_name}, | ||
//' \code{middle_name}, \code{last_name} and \code{suffix}. It is capable of dealing with compound surnames, multiple middle names, | ||
//' and similar variations, and is fully vectorised. | ||
//' | ||
//' @param names a character vector of names to parse. | ||
//' | ||
//' @return a data.frame with the columns \code{salutation}, \code{first_name}, | ||
//' \code{middle_name}, \code{last_name}, \code{suffix} and \code{full_name} (which contains the original name). In the | ||
//' event that a name doesn't \emph{have} a salutation, middle name, suffix, or so on, an empty string will be in that | ||
//' field instead. | ||
//' | ||
//' @examples | ||
//' # Parse a simple name | ||
//' parse_names("Oliver Keyes") | ||
//' | ||
//' # Parse a more complex name | ||
//' parse_names("Hon. Oliver Timothy Keyes Esq.") | ||
//' | ||
//' @export | ||
// [[Rcpp::export]] | ||
DataFrame humaniformat(std::vector < std::string > names){ | ||
DataFrame parse_names(std::vector < std::string > names){ | ||
human_parse parse_inst; | ||
return parse_inst.parse_vector(names); | ||
} | ||
|
||
// [[Rcpp::export]] | ||
std::vector < std::string > format_names(std::vector < std::string > names){ | ||
human_format format_inst; | ||
return format_inst.format_vector(names); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters