INDEX
Explanations
phrases describing changes in numbers or values
New Auto-Interp
Negative Logits
sonian
-0.67
ardi
-0.63
uminati
-0.60
sama
-0.60
ologies
-0.59
ician
-0.58
Pastebin
-0.58
exe
-0.58
Ide
-0.56
_-
-0.56
POSITIVE LOGITS
downhill
1.14
up
1.06
DOWN
0.97
backwards
0.97
downwards
0.94
down
0.94
upwards
0.93
sideways
0.90
upward
0.88
steadily
0.86
Activations Density 0.046%