INDEX
Explanations
actions or tasks that someone can do
phrases expressing capabilities or actions
New Auto-Interp
Negative Logits
ONSORED
-0.69
theless
-0.66
lights
-0.66
sentenced
-0.64
laus
-0.62
Canal
-0.62
Hung
-0.60
zilla
-0.59
Tai
-0.59
hattan
-0.59
POSITIVE LOGITS
omething
0.91
omsday
0.89
pez
0.81
anything
0.81
hing
0.77
ggy
0.77
xa
0.76
xx
0.74
ozy
0.74
xy
0.74
Activations Density 0.039%