Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 1 month ago

Wonder how the survey was sent out and whether that affected sampling.

Regardless, with -3-4k responses, that’s disappointing, if not concerning.

I only have a more personal sense for Lemmy. Do you have a source for Lemmy gender diversity?

Anyway, what do you think are the underlying issues? And what would be some suggestions to the community to address them?

inspxtr@lemmy.world · edit-2 10 months ago

Hold up, are you sure you can’t view Discussions or Wiki? Which sites can you not view them?

I’m fine viewing them for public repos that I usually visit.

Asking to make sure that Github is not slowly rolling out this lockdown.

inspxtr@lemmy.world · 1 year ago

the whole premise of OP is that this monitors people, and many organizations use TOTP, which one could also use without internet connections or phones AFAIK.

I’m in academia and I wish this is implemented more. Data breaches are getting quite common, and Github is so entwined in software engineering that it is critical to increase security measures.

inspxtr@lemmy.world · 1 year ago

or maybe most of them in a folder? and one file that defines their locations for environment variables

inspxtr@lemmy.world · 1 year ago

what are the other alternatives to ENV that are more preferred in terms of security?

inspxtr@lemmy.world · edit-2 1 year ago

yeah I guess maybe the formatting and the verbosity seems a bit annoying? Wonder what the alternatives solution could be to better engage people from mastodon, which is what this bot is trying to address.

edit: just to be clear, I’m not affiliated with the bot or its creator. This is just my observation from multiple posts I see this bot comments on.

inspxtr@lemmy.world · 1 year ago

I’m curious, why is this bot currently being downvoted for almost every comment it makes?

inspxtr@lemmy.world · edit-2 1 year ago

Thanks for the suggestions! I’m actually also looking into llamaindex for more conceptual comparison, though didn’t get to building an app yet.

Any general suggestions for locally hosted LLM with llamaindex by the way? I’m also running into some issues with hallucination. I’m using Ollama with llama2-13b and bge-large-en-v1.5 embedding model.

Anyway, aside from conceptual comparison, I’m also looking for more literal comparison, AFAIK, the choice of embedding model will affect how the similarity will be defined. Most of the current LLM embedding models are usually abstract and the similarity will be conceptual, like “I have 3 large dogs” and “There are three canine that I own” will probably be very similar. Do you know which choice of embedding model I should choose to have it more literal comparison?

That aside, like you indicated, there are some issues. One of it involves length. I hope to find something that can build up to find similar paragraphs iteratively from similar sentences. I can take a stab at coding it up but was just wondering if there are some similar frameworks out there already that I can model after.

inspxtr@lemmy.world · edit-2 1 year ago

Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 1 year ago

how bout baserow.io or nocodb cloud? Haven’t used them but I think they’re open source. But they don’t have mobile apps AFAIK for editing.

inspxtr@lemmy.world · 1 year ago

thanks! I’ll check it out.

inspxtr@lemmy.world · 1 year ago

thabks for your suggestions! I’m curious about mobile options in general, is there none for android either?

inspxtr@lemmy.world · 1 year ago

Suggestion for Airtable alternative with mobile options?

inspxtr@lemmy.world · 1 year ago

privacy limitations from github:

Read Before Using!

As of right now, WrangleBot is still in development. This means that there are some limitations to what WrangleBot can do.

These limitations are as follows:

WrangleBot Cloud Sync does not yet utilize end-to-end encryption to protect your data, but uses TLS-Encryption to communicate and send data between you and the cloud sync servers. This means that your data is encrypted while it is in transit, but not while it is stored on the cloud sync servers.

We are committed to addressing these limitations and implementing new features as soon as possible. We are also committed to protecting your data and privacy. We will never sell your data to third parties, and we will never use your data for any other purpose than to provide you with the best possible experience with WrangleBot. Please review our data privacy policy here for more information.

Anyway, this looks interesting regardless. There seems to be an offline mode, so I assume this is selfhostable? What’s the backend of AI here? And are both the bot and the AI part self-hostable?

inspxtr@lemmy.world · 1 year ago

looks cool, wonder how that compares to wallabag. Is there a mobile app as well?

inspxtr@lemmy.world · 1 year ago

Thanks for Floccus suggestions. It says it syncs over Nextcloud Bookmarks, does that mean you wouldn’t need a dedicated app except for Nextcloud?

inspxtr@lemmy.world · 1 year ago

I’m not entire sure what you mean by “printable reports”. Would you maybe want to post an example sketch?

Anyway, have you considered writing the variables to Latex maybe, then render that to PDF?

inspxtr@lemmy.world · 1 year ago

that looks cool! Do commenters need a github account to do that though?

inspxtr@lemmy.world · 1 year ago

thanks, and that’s great that this can be used for other static pages as well.

stupid general question: in the “install it yourself” guide, they say that this needs to be run on a VPS, for example with DigitalOcean. I’m thinking of deploying on fly.io, which I understand is like an alternative for Heroku. Is there a conceptual difference between these types of solutions (DigitalOcean vs fly.io for example) that might affect hosting?

inspxtr@lemmy.world · 1 year ago

Cool, thanks for the suggestions!

Never heard of webmentions but I’ve heard some people have integrated mastodon their Jekyll pages, I wonder if that’s the same thing.

inspxtr@lemmy.world · 1 year ago

I’ve heard that it’s not very privacy respecting, is it right?

inspxtr@lemmy.world · 1 year ago

Comment systems for static pages (Jekyll)?

inspxtr@lemmy.world · 1 year ago

lol what’s the context here?