Posts

8/8/2025

Image
some paradoxes:  caroll's pillow problem  -  A bag contains a counter, known to be either white or black. A white counter is put in, the bag is shaken, and a counter is drawn out, which proves to be white. What is now the chance of drawing a white counter? the chance is actually 2/3! coz if we list out all possible scenarios, only 3 are possible with our additional information, and 2 give the condition we want. this is similar to many other paradoxes, like the boy or girl paradox  and the monty hall problem . so many names for essentially the same thing but different objects i guess. Learning distributions with Variational Autoencoders : theory, geometry, and applications - since i prsented at SWIM 2025 , i've been to some of the talks! a variational autoencoder learns a probabilistic latent space that can generate new data samples by modeling the data distribution. it is trained using the Evidence Lower Bound (ELBO) loss function, and an efficiency measure is u...

7/22/2025

Image
 nvm google deepmind also got gold  on the imo. they should test these models on the ioi too 🤔 tori! in diff geo we are learning about parametrized surfaces! this site says that a torus can be covered with one surface patch   and thought of like a rectangular piece of rubber stretched around... but in our exercise we parametrize it like so and we need 4 patches to make sure we're working with open sets. i saw some other websites saying you need at least 2 patches?!? so idk. btw this 3d plotting calculator is great for visualizing things -  https://c3d.libretexts.org/CalcPlot3D/index.html there are also different types of tori frenet-serre equations these just seem a normal triplet of equations that form the orthonormal basis for the curve (called the frenet-serre frame), but they're incredibly useful for solving problems to investigate properties for curves. t = tangent vector, n = normal vector, b = binormal vector rubiks cubing i've been trying to cube recent...

7/20/2025

Image
this past week was the international math olympiad!! it was exciting to see the performance of people i've met in real life, and they all did very well, though i'm not a fair judge especially since i'm not that good at oly math. also, what's funny is that i thought there was no way ai would win a gold, and initial reports from matharena.ai showed that it couldn't even achieve a bronze medal... and then the day after openai just had to tell everyone that they achieved a "not very open" model that could get 35/42 (coordbashing geo lol). the proofstyle is very strange... see in this github repo , almost like the ai has developed its own way of checking itself as it proceeds down the proof. people on r/singularity cheered! it will definitely be interesting to see the future of math competitions now, but it could be just like chess, after all ai solving these problems is not really an apples to apples comparison to the students who do math contests in general, ...

7/12/2025

 okayy so i was at a math camp and it was very fun! hm the favorite math thing i learned at camp would probably be functional inequalities since i've never seen them before and it is non-geo (oops). back to the exploration jungle tho! i finally read a post about sparse autoencoders although it was very confusing, and here are some main takeaways 1. dictionary learning of features - creating a sparser dictionary such that linear combinations of its elements make up the activations of a layer, we want to encourage sparsity (fewer dictionary features are needed to reconstruct activations, this increases interpretability and efficiency) 2. but this is still hard, we have things like feature oversplitting (splitting features that should be cohesive) and infinite-width cookbook (memorizing examples such that inputs are directly put into dictionary). some other sparsity metrics include different l norms also encounter issues like shrinkage and load balancing. 3. choosing an activation fu...

6/25/2025

Image
 i guess the post frequency is gradually becoming once a week T_T anyways i am EXPECTED to post smth on substack soon so 👀lol what are these blogger emojis here is most recent quanta article i'm reading: new pyramid shape that always lands the same side up  - neat application of this thing -- space exploration! and it's interesting how ppl can just visualize these things, as someone who sucks at geometry T_T, esp in higher dimensions - conway was brilliant.  i'll just paste some literature review i did on some neural network stuff, some of the stuff is like copy-pasted from the paper abstracts (sorry!) Activation Anomaly Analysis (Mar 2020)  a novel approach for anomaly detection based on the hidden activation patterns of NNs, semi-supervised, purely data-driven anomaly detection solution, transferability of algorithm comprised of two parts:  a target network unrelated to the anomaly detection task  an alarm network analyzing the target’s activations Expe...

6/19/2025

shap model (SHapley Additive exPlanations, great acronym btw) - interpreting machine learning model predictions based on game theory, model agnostic/ad hoc approach, shows how much a certain feature pushed the output up or down speculative decoding  - make ai more efficient (algorithmic details here ), calculate tokens autoregressively but in parallel to be more efficient, fast approximative function for decoding i tried to watch this video on sheafification but i didn't understand anything other than the fact that i've basically forgotten everything from topology um really cool article about how scientists are using ai for manipulating the brain, takes the analogy between ai and neuroscience a bit further --> developments in ai can help us improve our understanding of the brain and develop applications like readers to help ppl with dyslexia. i think the interviewee's response to the question about the ethics of having a digital copy of a brain to be very insightful, ...

6/12/2025

Image
yes im very late this article argues about how mech interp is not that useful and it raises a lot of good points (i.e. in biology / other complex systems, we don't do the bottom up approach, and a lot of the buzz from mech interp comes from cherrypicked results, the compression involved leads to the loss of edge cases, so much has been invested but not much has come out, this post talks about how google stopped prioritizing SAEs because they weren't performing as well). i think i should definitely keep this in mind as i get more interested in interpretability; i've heard friends make such comments too. meanwhile this article by dario amodei  urges for the importance of mech interp, tho he is from anthropic which may influence his viewpoints. he advocates for the govt to also get involved, citing all the bad disasters that may happen because ai is too "opaque". i get his points too; the race between the development of ai models and our understanding of them is pret...