假设我有一个事件的轮班列表(格式为开始日期/时间、结束日期/时间) - 是否有某种算法可以用来创建日程的概括摘要?大多数轮班陷入某种常见的重复模式(即星期一上午 9:00 到下午 1:00,星期二上午 10:00 到下午 3:00 等)是很常见的.但是,此规则可以(并且将会)有例外(例如,其中一个班次在假期发生并被重新安排在第二天).最好从我的摘要"中排除那些,因为我希望提供一个更一般的答案,说明此事件通常何时发生.
我想我正在寻找某种统计方法来确定发生的日期和时间,并根据列表中找到的最频繁出现的情况创建描述.对于这样的事情是否有某种通用算法?有没有人创建过类似的东西?
理想情况下,我正在寻找 C# 或 VB.NET 中的解决方案,但不介意从任何其他语言移植.
提前致谢!
您可以使用
在这里您可以清楚地看到我们的七个集群.
这解决了您的部分问题:识别数据.现在您还希望能够对其进行标记.
因此,我们将获取每个集群并取平均值(四舍五入):
Table[Round[Mean[clusters[[i]]]], {i, 7}]
结果是:
日开始结束{1",10",15"},{1",12",17"},{3"、10"、15"}、{3",14",17"},{5"、10"、15"}、{5"、11"、15"}、{1"、7"、9"}
这样你就可以重新获得七门课了.
现在,也许您想对班次进行分类,无论是哪一天.如果同一个人每天在同一时间做同样的任务,那么称之为周一从 10 点到 15 点"是没有用的,因为它也发生在周三和周五(如我们的例子中).
让我们不考虑第一列来分析数据:
集群=FindClusters[Take[data, All, -2],Method->{Agglomerate",Linkage"->Complete"}];
在这种情况下,我们不会选择要检索的集群数量,而是由包决定.
结果是
您可以看到已识别出五个集群.
让我们尝试标记"他们和以前一样:
Grid[Table[Round[Mean[clusters[[i]]]], {i, 5}]]
结果是:
开始 结束{10",15"},{12",17"},{14",17"},{11",15"},{7",9"}
这正是我们怀疑"的:每天同一时间都有重复的事件可以组合在一起.
如果您有(或计划有)从一天开始到下一天结束的轮班,最好建模
{Start-Day Start-Hour Length}//正确!
比
{Start-Day Start-Hour End-Day End-Hour}//不正确!
那是因为与任何统计方法一样,必须明确变量之间的相关性,否则该方法会失败.该原则可以运行类似保持您的候选数据规范化"的内容.两个概念几乎一样(属性应该是独立的).
--- 编辑结束---
现在我猜你已经很清楚你可以用这种 if 分析做什么样的事情了.
HTH!
Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00 am to 3:00 pm, etc). However, there can (and will be) exceptions to this rule (eg. one of the shifts fell on a holiday and was rescheduled for the next day). It would be fine to exclude those from my "summary", as I'm looking to provide a more general answer of when does this event usually occur.
I guess I'm looking for some sort of statistical method to determine the day and time occurences and create a description based on the most frequent occurences found in the list. Is there some sort of general algorithm for something like this? Has anyone created something similar?
Ideally I'm looking for a solution in C# or VB.NET, but don't mind porting from any other language.
Thanks in advance!
You may use Cluster Analysis.
Clustering is a way to segregate a set of data into similar components (subsets). The "similarity" concept involves some definition of "distance" between points. Many usual formulas for the distance exists, among others the usual Euclidean distance.
Before pointing you to the quirks of the trade, let's show a practical case for your problem, so you may get involved in the algorithms and packages, or discard them upfront.
For easiness, I modelled the problem in Mathematica, because Cluster Analysis is included in the software and very straightforward to set up.
First, generate the data. The format is { DAY, START TIME, END TIME }.
The start and end times have a random variable added (+half hour, zero, -half hour} to show the capability of the algorithm to cope with "noise".
There are three days, three shifts per day and one extra (the last one) "anomalous" shift, which starts at 7 AM and ends at 9 AM (poor guys!).
There are 150 events in each "normal" shift and only two in the exceptional one.
As you can see, some shifts are not very far apart from each other.
I include the code in Mathematica, in case you have access to the software. I'm trying to avoid using the functional syntax, to make the code easier to read for "foreigners".
Here is the data generation code:
Rn[] := 0.5 * RandomInteger[{-1, 1}];
monshft1 = Table[{ 1 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 1
monshft2 = Table[{ 1 , 12 + Rn[] , 17 + Rn[] }, {150}]; // 2
wedshft1 = Table[{ 3 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 3
wedshft2 = Table[{ 3 , 14 + Rn[] , 17 + Rn[] }, {150}]; // 4
frishft1 = Table[{ 5 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 5
frishft2 = Table[{ 5 , 11 + Rn[] , 15 + Rn[] }, {150}]; // 6
monexcp = Table[{ 1 , 7 + Rn[] , 9 + Rn[] }, {2}]; // 7
Now we join the data, obtaining one big dataset:
data = Join[monshft1, monshft2, wedshft1, wedshft2, frishft1, frishft2, monexcp];
Let's run a cluster analysis for the data:
clusters = FindClusters[data, 7, Method->{"Agglomerate","Linkage"->"Complete"}]
"Agglomerate" and "Linkage" -> "Complete" are two fine tuning options of the clustering methods implemented in Mathematica. They just specify we are trying to find very compact clusters.
I specified to try to detect 7 clusters. If the right number of shifts is unknown, you can try several reasonable values and see the results, or let the algorithm select the more proper value.
We can get a chart with the results, each cluster in a different color (don't mind the code)
ListPointPlot3D[ clusters,
PlotStyle->{{PointSize[Large], Pink}, {PointSize[Large], Green},
{PointSize[Large], Yellow}, {PointSize[Large], Red},
{PointSize[Large], Black}, {PointSize[Large], Blue},
{PointSize[Large], Purple}, {PointSize[Large], Brown}},
AxesLabel -> {"DAY", "START TIME", "END TIME"}]
And the result is:
Where you can see our seven clusters clearly apart.
That solves part of your problem: identifying the data. Now you also want to be able to label it.
So, we'll get each cluster and take means (rounded):
Table[Round[Mean[clusters[[i]]]], {i, 7}]
The result is:
Day Start End
{"1", "10", "15"},
{"1", "12", "17"},
{"3", "10", "15"},
{"3", "14", "17"},
{"5", "10", "15"},
{"5", "11", "15"},
{"1", "7", "9"}
And with that you get again your seven classes.
Now, perhaps you want to classify the shifts, no matter the day. If the same people make the same task at the same time everyday, so it's no useful to call it "Monday shift from 10 to 15", because it happens also on Weds and Fridays (as in our example).
Let's analyze the data disregarding the first column:
clusters=
FindClusters[Take[data, All, -2],Method->{"Agglomerate","Linkage"->"Complete"}];
In this case, we are not selecting the number of clusters to retrieve, leaving the decision to the package.
The result is
You can see that five clusters have been identified.
Let's try to "label" them as before:
Grid[Table[Round[Mean[clusters[[i]]]], {i, 5}]]
The result is:
START END
{"10", "15"},
{"12", "17"},
{"14", "17"},
{"11", "15"},
{ "7", "9"}
Which is exactly what we "suspected": there are repeated events each day at the same time that could be grouped together.
If you have (or plan to have) shifts that start one day and end on the following, it's better to model
{Start-Day Start-Hour Length} // Correct!
than
{Start-Day Start-Hour End-Day End-Hour} // Incorrect!
That's because as with any statistical method, the correlation between the variables must be made explicit, or the method fails miserably. The principle could run something like "keep your candidate data normalized". Both concepts are almost the same (the attributes should be independent).
--- Edit end ---
By now I guess you understand pretty well what kind of things you can do with this kind if analysis.
HTH!
这篇关于给定轮班列表,创建时间表的摘要描述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!