INDEX
Explanations
phrases indicating roles, templates, or guiding frameworks
New Auto-Interp
Negative Logits
aspect
-0.15
pickle
-0.14
aku
-0.14
hallmark
-0.14
loff
-0.14
icol
-0.13
aspects
-0.13
alla
-0.13
pans
-0.13
itures
-0.13
POSITIVE LOGITS
starting
0.46
starting
0.39
guide
0.37
Starting
0.37
Starting
0.36
guide
0.33
reference
0.31
jumping
0.30
reference
0.29
guides
0.28
Activations Density 0.165%