7/12/2025
okayy so i was at a math camp and it was very fun!
hm the favorite math thing i learned at camp would probably be functional inequalities since i've never seen them before and it is non-geo (oops).
back to the exploration jungle tho!
i finally read a post about sparse autoencoders although it was very confusing, and here are some main takeaways
1. dictionary learning of features - creating a sparser dictionary such that linear combinations of its elements make up the activations of a layer, we want to encourage sparsity (fewer dictionary features are needed to reconstruct activations, this increases interpretability and efficiency)
2. but this is still hard, we have things like feature oversplitting (splitting features that should be cohesive) and infinite-width cookbook (memorizing examples such that inputs are directly put into dictionary). some other sparsity metrics include different l norms also encounter issues like shrinkage and load balancing.
3. choosing an activation function for the decoder - inspired by compressed sensing, using top k to improve generalization and stability (rate distortion constant)
diff geo class starting on monday... dunno what to expect but i will definitely be learning a lot :D
ellipsoid trick for sphere packing! very cool, breaking expectations of people who stick to finding convenient lattices
reading papers to try to figure out how to incorporate legalbert + bigru models?? to improve accuracy with classification task
Comments
Post a Comment