INDEX
Explanations
references to the concept of an "ideal" in various contexts
New Auto-Interp
Negative Logits
eldon
-0.15
dale
-0.15
assen
-0.15
Howard
-0.15
kd
-0.14
dio
-0.14
oran
-0.14
ktor
-0.14
/how
-0.14
than
-0.13
POSITIVE LOGITS
istic
0.19
mente
0.18
istically
0.16
ably
0.16
ivil
0.15
iminal
0.15
cala
0.15
conditions
0.15
imal
0.15
iterals
0.15
Activations Density 0.031%