INDEX
Explanations
adjectives related to intensity or enhancement
New Auto-Interp
Negative Logits
Goo
-0.63
orers
-0.62
CLA
-0.58
zip
-0.56
opers
-0.55
craw
-0.55
ANGE
-0.55
emp
-0.55
lore
-0.55
CoC
-0.53
POSITIVE LOGITS
by
1.23
by
0.91
BY
0.88
By
0.87
By
0.83
anew
0.81
igated
0.77
exponentially
0.74
aback
0.73
.</
0.72
Activations Density 0.148%