INDEX
Explanations
emails mentioned in the text
references to email communication
New Auto-Interp
Negative Logits
gears
-0.81
heights
-0.74
squats
-0.73
braces
-0.70
hotter
-0.69
Ernst
-0.69
clocks
-0.68
oils
-0.66
hinges
-0.65
realism
-0.65
POSITIVE LOGITS
1.37
1.34
mails
1.05
letter
1.05
1.01
MA
0.94
tymology
0.93
0.92
0.90
Newsletter
0.90
Activations Density 0.030%