INDEX
Explanations
terms related to observation and monitoring
New Auto-Interp
Negative Logits
A
-0.70
P
-0.68
P
-0.65
The
-0.64
-0.62
All
-0.61
<eos>
-0.60
K
-0.60
C
-0.58
It
-0.58
POSITIVE LOGITS
observations
1.53
Observations
1.53
observations
1.51
Observation
1.49
OBSERV
1.47
obser
1.46
observes
1.45
Observ
1.43
observation
1.41
OBSERV
1.39
Activations Density 0.079%