-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how can I make images compressed #194
Comments
Hello, It's important that you always use the same reference when doing these operations, so in your case, you should give the page reference in I don't see where you're compressing the image, so can't validate that part. I'm also not sure if you can write to the output path like that if you still have the file opened. |
Thank you for your replay. // method 1
Page: requests.Page{
ByIndex: &requests.PageByIndex{
Document: pdfDoc.Document,
Index: i,
},
}, // method2
Page: requests.Page{
ByReference: &pdfPage.Page,
} Am I right? Second func SaveImageFromData(data []byte, filePath string) (string, error) {
writeRawFile(filePath, data)
// 使用 image.Decode 直接解码图片
img, format, err := image.Decode(bytes.NewBuffer(data))
if err != nil {
return "", fmt.Errorf("无法解码图片: %v", err)
}
// 根据格式保存文件
switch format {
case "jpeg", "jpg":
filePath = fmt.Sprintf("%s.jpeg", filePath)
return filePath, writeJPEGFile(filePath, img)
case "png":
filePath = fmt.Sprintf("%s.png", filePath)
return filePath, writePNGFile(filePath, img)
case "gif":
filePath = fmt.Sprintf("%s.gif", filePath)
return filePath, writeGIFFile(filePath, img)
default:
filePath = fmt.Sprintf("%s.raw", filePath)
return filePath, writeRawFile(filePath, data)
}
}
func writeJPEGFile(filename string, img image.Image) error {
outFile, err := os.Create(filename)
if err != nil {
return err
}
defer outFile.Close()
opts := &jpeg.Options{Quality: 60} // Adjust the quality as needed
return jpeg.Encode(outFile, img, opts)
} The question is when I decode a /DCTDecode filter(hint a jpeg) image object, I can do it. The go library image.Decode can give the right format -- jpeg filter:[DCTDecode] bitmap buffer len:199584 format:2 FPDF_BITMAP_FORMAT_BGR {Width:167 Height:396 HorizontalDPI:96 VerticalDPI:96 BitsPerPixel:24 Colorspace:2 MarkedContentID:-1} But when I decode a /FlateDecode filter (hint a png) image object, Ican't do it. The go library image.Decode will return a error:unknown format. filter:[FlateDecode] bitmap buffer len:226800 format:2 FPDF_BITMAP_FORMAT_BGR {Width:540 Height:140 HorizontalDPI:431.78244 VerticalDPI:431.78238 BitsPerPixel:24 Colorspace:2 MarkedContentID:-1}
image: unknown format I tested three ways to extract the image object data:FPDFImageObj_GetImageDataDecoded/FPDFImageObj_GetImageDataRaw/FPDFImageObj_GetBitmap+FPDFBitmap_GetBuffer. I'm taking these three results as arguments to SaveImageFromData separately. So there must be some information that I didn't use to extract the PNG image correctly. |
Correct, if you want to make changes to the page, use
I don't think these are exactly correct, I don't think the filter FlateDecode indicates that it's PNG. It's probably that I think your best bet would be Use a combination of the following methods: Use a Go Probably you only want to compress |
Thank you for your reply. I've made a lot of progress. But I'm running into a couple of new issues. func ConvertToJPEG(width, height, stride int, data []byte, outputPath string, quality int, format int) error {
// 创建一个 RGBA 图像
img, err := RenderImage(data, width, height, stride, format)
if err != nil {
return err
}
// 创建输出文件
outFile, err := os.Create(outputPath)
if err != nil {
return err
}
defer outFile.Close()
// 设置 JPEG 压缩质量
jpegOptions := &jpeg.Options{Quality: quality}
// 将图像编码为 JPEG 格式并写入输出文件
if err := jpeg.Encode(outFile, img, jpegOptions); err != nil {
return err
}
log.Printf("JPEG 图像已保存到: %s", outputPath)
return nil
}
func RenderImage(data []byte, width int, height int, stride int, format int) (image.Image, error) {
switch enums.FPDF_BITMAP_FORMAT(format) {
case enums.FPDF_BITMAP_FORMAT_BGR:
fmt.Println("BGR")
img := image.NewRGBA(image.Rect(0, 0, width, height))
for y := 0; y < height; y++ {
for x := 0; x < width; x++ {
var r, g, b, a uint8
// 计算数据索引
index := y*stride + x*3 // 每个像素有 3 个字节(BGR)
if index+2 < len(data) { // 确保不越界
b = data[index]
g = data[index+1]
r = data[index+2]
a = 255 // 默认 alpha 为 255
}
img.Set(x, y, color.RGBA{r, g, b, a})
}
}
return img, nil
// reduce unused case code
}
return nil, fmt.Errorf("不支持的图片格式: %d", format)
} Q1: with some pdf files, (I set jpeg quality = 90)the compressed result is bigger than the original file Q2: with some pdf files, some images are destroyed maybe the original image is a png file, so there has transparence setting, but when I extract images from pdf, for this image, I got message: imageMetadataRes:{Width:540 Height:140 HorizontalDPI:431.78244 VerticalDPI:431.78238 BitsPerPixel:24 Colorspace:2 MarkedContentID:-1}
filter:[FlateDecode]
bitmap info:width:540 height:140 stride:1620 format:2 data len:226800 because the format is 2 == FPDF_BITMAP_FORMAT_BGR, So there only has r,g,b information. Q3: with some pdf files, the compressed has lost a lot of message I'm sorry I have so many questions to bother you, Wish you a happy life 🕶 @jerbob92 |
Yes, I have noticed this as well, that's why I said you might not want to re-compress images that have a
Main issue might be that in Go, no transparency = black, in other implementations, no transparency = white.
Are you sure that the text that is gone now is actually inside an image? Did you try to store the image to see if the decoding was correct? |
Hello, thank you for your reply. Q1:
What confused me was that I couldn't get the tranparence information in the image extracted by the pdf. Because I use FPDFBitmap_GetFormat, what I get is 2 == FPDF_BITMAP_FORMAT_BGR, there is no tansparence info. So I'm not sure when to add a white background. Q2: I have another question about insert a same image in every page here is my code // 打开一个新的PDF文档
filePdfDoc, err := instance.FPDF_LoadDocument(&requests.FPDF_LoadDocument{
Path: &filePath,
Password: nil,
})
pageCount, err := instance.FPDF_GetPageCount(&requests.FPDF_GetPageCount{
Document: filePdfDoc.Document,
})
// get watermark image (a png file)
watermarkTemp, err := os.Open(watermarkPath)
defer watermarkTemp.Close()
watermark, err := png.Decode(watermarkTemp)
var rgbaImg *image.RGBA
if rgba, ok := watermark.(*image.RGBA); ok {
rgbaImg = rgba
} else {
// 如果图像不是RGBA格式,则进行转换
rgbaImg = image.NewRGBA(watermark.Bounds())
draw.Draw(rgbaImg, rgbaImg.Bounds(), watermark, image.Point{}, draw.Src)
}
watermarkWidth = rgbaImg.Rect.Dx()
watermarkHeight = rgbaImg.Rect.Dy()
watermarkBitmap, err := instance.FPDFBitmap_Create(&requests.FPDFBitmap_Create{
Width: watermarkWidth,
Height: watermarkHeight,
Alpha: 1,
})
watermarkBuffer, err := instance.FPDFBitmap_GetBuffer(&requests.FPDFBitmap_GetBuffer{
Bitmap: watermarkBitmap.Bitmap,
})
watermarkStride, err := instance.FPDFBitmap_GetStride(&requests.FPDFBitmap_GetStride{
Bitmap: watermarkBitmap.Bitmap,
})
stride := int(watermarkStride.Stride)
// 将PNG图像数据复制到FPDF_BITMAP
for y := 0; y < watermarkHeight; y++ {
srcStart := y * watermarkWidth * 4
dstStart := y * stride
for x := 0; x < watermarkWidth; x++ {
// 计算源图像和目标缓冲区的索引
srcIndex := srcStart + x*4
dstIndex := dstStart + x*4
// 复制Alpha通道
watermarkBuffer.Buffer[dstIndex+3] = rgbaImg.Pix[srcIndex+3]
// 交换红色和蓝色通道,并复制绿色通道
watermarkBuffer.Buffer[dstIndex] = rgbaImg.Pix[srcIndex+2] // Blue
watermarkBuffer.Buffer[dstIndex+1] = rgbaImg.Pix[srcIndex+1] // Green
watermarkBuffer.Buffer[dstIndex+2] = rgbaImg.Pix[srcIndex] // Red
}
}
for pageIndex := 0; pageIndex < pageCount.PageCount; pageIndex++ {
filePdfPageTemp, err := instance.FPDF_LoadPage(&requests.FPDF_LoadPage{
Document: filePdfDoc.Document,
Index: pageIndex,
})
pageByIndex := requests.PageByIndex{
Document: filePdfDoc.Document,
Index: pageIndex,
}
filePdfPage := requests.Page{
ByIndex: &pageByIndex,
ByReference: &filePdfPageTemp.Page,
}
filePageWidth, err := instance.FPDF_GetPageWidth(&requests.FPDF_GetPageWidth{
Page: filePdfPage,
})
// 获取页高
filePageHeight, err := instance.FPDF_GetPageHeight(&requests.FPDF_GetPageHeight{
Page: filePdfPage,
})
scale := math.Min(filePageHeight.Height, filePageWidth.Width) / 595
watermarkImageObj, err := instance.FPDFPageObj_NewImageObj(&requests.FPDFPageObj_NewImageObj{
Document: filePdfDoc.Document,
})
// 将图片加载到ImageObject中,ImageObject是Page中的图片对象
_, err = instance.FPDFImageObj_SetBitmap(&requests.FPDFImageObj_SetBitmap{
ImageObject: watermarkImageObj.PageObject,
Bitmap: watermarkBitmap.Bitmap,
})
// 调整图片对象的尺寸和位置
_, err = instance.FPDFImageObj_SetMatrix(&requests.FPDFImageObj_SetMatrix{
ImageObject: watermarkImageObj.PageObject,
Transform: structs.FPDF_FS_MATRIX{
A: balabala,
B: 0,
C: 0,
D: balabala,
E: balabala,
F:balabala,
},
})
_, err = instance.FPDFPage_InsertObject(&requests.FPDFPage_InsertObject{
Page: filePdfPage,
PageObject: watermarkImageObj.PageObject,
})
_, err = instance.FPDFPage_GenerateContent(&requests.FPDFPage_GenerateContent{
Page: filePdfPage,
})
_, err = instance.FPDF_ClosePage(&requests.FPDF_ClosePage{
Page: filePdfPageTemp.Page,
})
}
_, err = instance.FPDF_SaveAsCopy(&requests.FPDF_SaveAsCopy{
Document: filePdfDoc.Document,
FilePath: &outputPath,
}) |
The problem is that you're trying to put in a color model that has no transparency ( In that case you can just copy
I wouldn't know about that, that's Pdfium internals, this is just a library implementing it in Go. What you can try is calling |
I'm sorry for not making it clear, the problem I'm having is this: After adding watermark, I get this: Then I read this image from the pdf by getbitmap, and found that its format is FPDF_BITMAP_FORMAT_BGR, and the transparency information is lost. I get this: Then I use your method - add white background img := image.NewRGBA(image.Rect(0, 0, width, height))
for y := 0; y < height; y++ {
for x := 0; x < width; x++ {
var r, g, b, a uint8
index := y*stride + x*3 // 每个像素有 3 个字节(BGR)
if index+2 < len(data) { // 确保不越界
b = data[index]
g = data[index+1]
r = data[index+2]
a = 0 // set zero
}
img.Set(x, y, color.RGBA{r, g, b, a})
}
imageWithWhiteBackground := image.NewRGBA(img.Bounds())
draw.Draw(imageWithWhiteBackground, imageWithWhiteBackground.Bounds(), image.NewUniform(color.White), image.Point{}, draw.Src)
draw.Draw(imageWithWhiteBackground, imageWithWhiteBackground.Bounds(), img, img.Bounds().Min, draw.Over)
img = imageWithWhiteBackground
} after this , I use loadJpegInline, I got this: yes, I did get decent results. But I really do have two questions. 1.I can do this by adding a white background, but I don't know when or under what circumstances to do this, because what images info I get that not give me transparence information logo.pdf Wish you have a good day. |
That doesn't really matter if you insert it as JPEG, it will lose it's alpha channel. I think you would have to use
I think you should do it when the image format is BGR, and not when it's BGRA, but I'm not sure, probably best to try and find out.
I'm not sure where that is coming from. Perhaps they are just compression artifacts, quality 50 is really low, did you try a higher quality? Did you also try the other option? (using a BGR compatible image and just copying the image data over?) |
Thank you for your reply, I will try it tomorrow. Thank you again. |
I think I may have found the problem, according to this pose : How do I extract images from PDFs with no black background? Use FPDFPageObj_GetType, there is no type named FPDF_PAGEOBJ_IMAGE_MASK. So I can't get the mask image, So every time I get a black image like this |
I wouldn't know. Did you confirm with a PDF inspection tool (for example iText's RUPS) to check if that's actually the case? |
I run with FPDFImageObj_GetRenderedBitmap, but I meet this error:
|
You need to update your pdfium, Version |
Q1. Wouldn't know about that, you would have to ask Pdfium |
Thank you, I will ask Pdfium group. |
I want to compress images in the pdf, but what I do is not work.
here is my algorithm:
here is my code. I can extract images and try to compressed, but the compressed images are not loaded to the pdf.
So No matter how many times I run it, the size of pdf stays the same.
I'm trapped.
The text was updated successfully, but these errors were encountered: