8/8/2025
some paradoxes:
caroll's pillow problem - A bag contains a counter, known to be either white or black. A white counter is put in, the bag is shaken, and a counter is drawn out, which proves to be white. What is now the chance of drawing a white counter?
the chance is actually 2/3! coz if we list out all possible scenarios, only 3 are possible with our additional information, and 2 give the condition we want. this is similar to many other paradoxes, like the boy or girl paradox and the monty hall problem. so many names for essentially the same thing but different objects i guess.
Learning distributions with Variational Autoencoders: theory, geometry, and applications - since i prsented at SWIM 2025, i've been to some of the talks! a variational autoencoder learns a probabilistic latent space that can generate new data samples by modeling the data distribution. it is trained using the Evidence Lower Bound (ELBO) loss function, and an efficiency measure is using the reparameterization trick (a.k.a. stochastic backpropagation).
so there's been a bunch of buzz surrounding how 17 year-old hannah cairo disproved this famous conjecture by providing a counterexample: power of counterexamples (my ted talk!)!! i honestly do not really understand what the Mizohata-Takeuchi conjecture is even after reading some stuff, something about the boundedness of Fourier integral operators and restriction phenomena of the Fourier transform??
representation engineering - lowkey i've kept the paper talking about it bookmarked for a long time, but i think this is a cool area of ai safety that i can also consider thinking about; it is broader than mech interp. it's introduced in this paper: the goal of rep theory is to locate emergent representations for high-level concepts (truthfulness, utility, probability, morality, and emotion) and functions (processes such as lying and power-seeking) within a network, similar to doing a CT scan for the ai <--> bio analogy. the baseline technique is LAT following these steps: (1) Designing Stimulus and Task, (2) Collecting Neural Activity, and (3) Constructing a Linear Model
different evaluation types: correlation, manipulation, termination, and recovery.
Representation Control seeks to modify or control the internal representations of concepts and functions. LoRRA stands for Low-Rank Representation Adaptation.
some cool pics of the results (reminds me of the heatmaps i tried generating for my non-trivial thing):
basically, this is an activation-based paradigm for controlling LLMs consisting of representation reading and representation steering, which is similar to what i've been thinking about myself for a while.
here's some more resources from an alignment forum post, including current problems in the field. similarly to how in mech interp we have hypotheses, there are also repE hypotheses: 1) linear representation hypothesis 2) resilience to activation mapping 3) LLMs already represent human-understandable concepts 4) direct control over model state.
now i dunno much earth science but apparently the earth's core is leaking?? evidence is ruthenium anomalies hm. video creds to Deposit Photos/Alamy
finally, in diff geo we're talking a lot about curvature. here's a list of the many types of curvature:
curvature (kappa) - Measures how much a curve deviates from being a straight line.
gaussian curvature (K) - intrinsic measure, product of the principal curvatures
mean curvature (H) - extrinsic measure, average of principal curvatures
normal curvature (kappa_n) - curvature of a curve obtained by intersecting the surface with a plane containing the normal vector
principal curvature (kappa_1 and kappa_2) - maximum and minimum normal curvatures at a point on a surface
geodesic curvature (kappa_g) - measures how much a curve on a surface deviates from being a geodesic
first fundamental form - metric properties of a surface, such as lengths, angles, and areas, without considering how the surface is embedded in space
second fundamental form - how the normal vector changes along the surface
Comments
Post a Comment