INDEX
Explanations
terms related to minimalism
references to minimalism
New Auto-Interp
Negative Logits
lyn
-0.73
Report
-0.72
Sah
-0.70
ravings
-0.67
dor
-0.65
Phill
-0.65
uay
-0.65
Queen
-0.65
Gener
-0.65
RAFT
-0.64
POSITIVE LOGITS
istic
1.12
istically
0.99
etheless
0.99
amount
0.94
ism
0.89
izes
0.88
ocre
0.87
isite
0.86
amounts
0.84
effort
0.84
Activations Density 0.015%