INDEX
Explanations
phrases indicating uncertainty or choice
repeated usage of the word "which" across various contexts
New Auto-Interp
Negative Logits
Rog
-0.81
GROUND
-0.79
Balt
-0.76
kj
-0.73
bly
-0.72
kamp
-0.71
UX
-0.71
FINE
-0.69
Glob
-0.69
fitting
-0.67
POSITIVE LOGITS
kinds
0.91
sorts
0.81
soever
0.78
redes
0.76
types
0.74
direction
0.70
kind
0.70
flavors
0.69
contingency
0.68
ones
0.66
Activations Density 0.051%