.net中实现对文章内容分页实现程序

作者：袖梨 2022-06-25

现在问题出来了，文章内容里面包含了很多HTML标签，用SubString进行截取那会把HTML标签也给截断，也在可能会在标签的属性中截开，这样我们得出来的字符串就是错乱的了。因此在截取时，对HTML标签进行过滤。

自己的表达能力有限，直接上代码。

代码如下

复制代码

///

/// 得到分页后的数据
///
/// 文章内容
/// 文章字数（不包含HTML）
///
public static List SubstringTo(string param, int size)
{
param = NoHTML(param);//过滤一些有Wap上面不能显示的HTML标签，你也不可不过滤
var length = param.ToCharArray().Length;
var being = 0;
var list = new List();
while (true)
{
string str = SubstringToHTML(param, being, size, "", out being);
list.Add(str);
if (length {
break;
}
}
return list;
}

///

/// 按字节长度截取字符串(支持截取带HTML代码样式的字符串)
///
/// 将要截取的字符串参数
/// 截取的字节长度
/// 字符串末尾补上的字符串
/// 返回截取后的字符串
public static string SubstringToHTML(string param, int being, int length, string end, out int index)
{
string Pattern = null;
MatchCollection m = null;
StringBuilder result = new StringBuilder();
int n = 0;
char temp;
bool isCode = false; //是不是HTML代码
bool isHTML = false; //是不是HTML特殊字符,如
char[] pchar = param.ToCharArray();
int i = 0;
for (i = being; i {
temp = pchar[i];
if (temp == ' {
isCode = true;
}
else if (temp == '&')
{
isHTML = true;
}
else if (temp == '>' && isCode)
{
//n = n - 1;
isCode = false;
}
else if (isHTML)
{
isHTML = false;
}
if (!isCode && !isHTML)
{
n = n + 1;
//UNICODE码字符占两个字节
if (System.Text.Encoding.Default.GetBytes(temp + "").Length > 1)
{
n = n + 1;
}
}
result.Append(temp);
if (n >= length)
{
break;
}
}
index = i + 1;
result.Append(end);
//去掉成对的HTML标记,我的正则表达式不好，所以这里写得不好，大家可以写个正则直接去掉所有的
temp_result = Regex.Replace(temp_result, @"(?is)

]*?>.*?

", "$2", RegexOptions.IgnoreCase);
temp_result = Regex.Replace(temp_result, @"(?is)", "$2", RegexOptions.IgnoreCase);
temp_result = Regex.Replace(temp_result, @"(?is)]*>", "$2", RegexOptions.IgnoreCase);
temp_result = Regex.Replace(temp_result, @"(?is)
]*>", "$2", RegexOptions.IgnoreCase);
//用正则表达式取出标记
Pattern = ("([a-zA-Z]+)*>");
m = Regex.Matches(temp_result, Pattern);
ArrayList bengHTML = new ArrayList();
foreach (Match mt in m)
{
bengHTML.Add(mt.Result("$1"));
}
//补全前面不成对的HTML标记
for (int nn = bengHTML.Count - 1; nn >= 0; nn--)
{
result.Insert(0, "");
}
//用正则表达式取出标记
Pattern = ("]*>");
m = Regex.Matches(temp_result, Pattern);
ArrayList endHTML = new ArrayList();
foreach (Match mt in m)
{
endHTML.Add(mt.Result("$1"));
}
//补全后面不成对的HTML标记
for (int nn = endHTML.Count - 1; nn >= 0; nn--)
{
result.Append("");
result.Append(endHTML[nn]);
result.Append(">");
}
return result.ToString();
}

总结：

对文章分页与和数据库分页有一定的区别，它可以有很多种方法有一点就是大家常用的把文件分段保存到数据库，然后读出时判断来分页，另一种方法是我常用的就是利用编辑器的分页符在要分页的地方插入，然后读出时再利用分切函数分开，再利用for来进行分页，上面实现也是如此。