6/25/2025

 i guess the post frequency is gradually becoming once a week T_T anyways i am EXPECTED to post smth on substack soon so 👀lol what are these blogger emojis

here is most recent quanta article i'm reading: new pyramid shape that always lands the same side up - neat application of this thing -- space exploration! and it's interesting how ppl can just visualize these things, as someone who sucks at geometry T_T, esp in higher dimensions - conway was brilliant. 

i'll just paste some literature review i did on some neural network stuff, some of the stuff is like copy-pasted from the paper abstracts (sorry!)

Activation Anomaly Analysis (Mar 2020) 

  • a novel approach for anomaly detection based on the hidden activation patterns of NNs, semi-supervised, purely data-driven anomaly detection solution, transferability of algorithm

  • comprised of two parts: 

    • a target network unrelated to the anomaly detection task 

    • an alarm network analyzing the target’s activations

  • Experiments give high f1, precision, and recall scores

  • Datasets used: MNIST, EMNIST, CSE-CIC-IDS2018 (intrusion detection data set containing network data along with anomaly labels)

    • To prevent class imbalance issues, loss for anomalous samples is weighted higher than for normal samples


DeepScan

- Detecting Adversarial Attacks via Subset Scanning of Autoencoder Activations and Reconstruction Error (2020)

Subset-scanning on internal activations:

  • Borrowed from anomalous-pattern detection—scan for the most anomalous subset of activations within the AE

  • Uses Non-Parametric Scan Statistics (NPSS) combined with the Linear Time Subset Scanning (LTSS) property to efficiently detect anomalies in hidden layers

  • Detects anomalies by comparing test activations to a “clean” background distribution, computing p-values, and identifying subsets with unusually high activation deviations 

Complementary pixel-space subset scanning:

  • Applies the same technique to reconstruction error: identifying groups of pixels that are distorted more than expected, which aids in interpretability


Weakly Supervised Detection of Hallucinations in LLM Activations (Dec 2023)

  • weakly supervised auditing technique using a subset scanning approach to detect anomalous patterns in LLM activations from pre-trained models

    • goal is to determine if a pre-trained LLM has internalized harmful anomalous patterns (e.g., hallucinations) by examining its internal states (node activations)

  • approach only requires access to samples labeled as “normal” (true) 

  • Scanning approach: 

    • Nodes: individual activation units in one or multiple layers (e.g., transformer encoder/decoder layers)

      • For each activation unit (node) j, compare its activation on a test sentence to the empirical distribution from the reference dataset.

      • Calculate an empirical p-value indicating how extreme the activation is relative to reference activations

    • Sentences: scan across a batch of test sentences to find clusters of anomalous activations in specific sentences


Deep Semi-Supervised Anomaly Detection (Feb 2020)

  • information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution


DA3G: Detecting Adversarial Attacks by Analysing Gradients

  • a general end-to-end method to detect adversarial examples based on the analysis of neural networks’ gradients

  • target-alarm structure


Obfuscated Activations Bypass LLM Latent-Space Defenses (Feb 2025)


Computational Modeling of Deep Multiresolution-Fractal Texture and Its Application to Abnormal Brain Tissue Segmentation (Jun 2023)

  • Multiresolution-Fractional Brownian motion (fBm) model and deep multiresolution analysis combined

  • estimate stochastic deep multiresolution fractal texture features for tumor tissues in brain MRI images


FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

  • a geometric & probabilistic framework for unsupervised mechanistic anomaly detection in deep neural networks, geared towards adversarial attack mitigation

  • FACADE elucidates circuit contributions to the properties of high-dimensional activation modes, aiding in adversarial attack identification, seen as probabilistic outliers in geometric transformations

    • Steps:

      • probabilistic Dirichlet Process Mixture model for unsupervised clustering (DP-Means) to identify ”pseudoclass” modes in intermediate activation space for a given density threshold λ 

      • find circuits responsible for pseudoclass formation and propagation through causal discovery and Automatic Circuit DisCovery (ACDC) 

      • determine manifold and kernel density properties of pseudoclass propagation through circuits and in relation to final classes through mean-field theoretic approximation

      • generate a distribution over circuits as they contribute to changes in manifold properties of pseudoclasses as they propagate through the network, e.g. effective reduction in radius or dimension 


Examining properties of individual neurons: [2502.06809] Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution 


Mechanistic Anomaly Detection

Comments

Popular posts from this blog

no more ap exams ever!

may 25 2025

happy may