INDEX
Explanations
interesting events, decisions, or developments in various contexts
New Auto-Interp
Negative Logits
uts
-1.09
aper
-0.99
otent
-0.96
utable
-0.95
helle
-0.94
vr
-0.93
hent
-0.92
eded
-0.90
apers
-0.89
reditation
-0.89
POSITIVE LOGITS
ioned
1.21
Flavoring
1.14
tid
1.13
twists
1.08
sidel
1.05
arios
1.02
ly
1.01
trivia
0.99
insights
0.98
andum
0.96
Activations Density 1.075%