INDEX
Explanations
mentions of people's names in a formal context
colons indicating lists or enumerations
New Auto-Interp
Negative Logits
tremend
-0.84
turbulence
-0.76
behav
-0.75
behavi
-0.72
pursu
-0.72
undai
-0.69
ilater
-0.67
troubles
-0.65
overl
-0.64
padd
-0.64
POSITIVE LOGITS
Yeah
0.90
Who
0.84
Rise
0.82
Bringing
0.82
Cosponsors
0.81
Reloaded
0.79
YES
0.79
Join
0.78
Provided
0.77
Originally
0.76
Activations Density 0.086%