INDEX
Explanations
references to social gatherings and alumni events
New Auto-Interp
Negative Logits
¢°
-0.14
argas
-0.14
otec
-0.13
strup
-0.13
UNDLE
-0.13
igi
-0.13
_learning
-0.12
preh
-0.12
training
-0.12
/Instruction
-0.12
POSITIVE LOGITS
alumni
0.49
Alumni
0.48
reunion
0.45
alum
0.44
al
0.40
reun
0.39
/al
0.35
umni
0.33
reunited
0.31
-al
0.31
Activations Density 0.062%