INDEX
Explanations
phrases that emphasize repetition and redundancy
New Auto-Interp
Negative Logits
Authors
-0.62
Ge
-0.61
Thirty
-0.60
saf
-0.59
itans
-0.57
vana
-0.56
afety
-0.56
Syndicate
-0.56
VICE
-0.56
ogens
-0.56
POSITIVE LOGITS
again
1.06
etheless
0.94
drive
0.94
again
0.81
ride
0.76
clock
0.75
until
0.73
repe
0.73
stretched
0.70
haul
0.70
Activations Density 0.009%