INDEX
Explanations
references to challenges and resources in various contexts
New Auto-Interp
Negative Logits
flen
-0.14
elig
-0.14
.allow
-0.14
ende
-0.13
Fucked
-0.13
lending
-0.13
ваг
-0.13
unami
-0.13
eron
-0.13
ï
-0.13
POSITIVE LOGITS
consume
0.47
consumes
0.41
Consum
0.41
consuming
0.40
consume
0.39
eats
0.38
drain
0.36
eat
0.35
Drain
0.33
devour
0.33
Activations Density 0.263%