INDEX
Explanations
phrases related to observation or visibility
New Auto-Interp
Negative Logits
cycl
-0.69
eware
-0.66
uga
-0.64
ecycle
-0.62
ourse
-0.62
mbudsman
-0.62
rites
-0.60
ranged
-0.59
nation
-0.59
cake
-0.59
POSITIVE LOGITS
why
1.08
how
0.89
clearly
0.86
whats
0.81
glimps
0.81
WHY
0.79
plainly
0.78
traces
0.76
similarities
0.75
signs
0.75
Activations Density 0.090%