INDEX
Explanations
words related to forceful and impactful actions
words related to flashy or attention-grabbing actions
New Auto-Interp
Negative Logits
ervation
-0.80
Gutenberg
-0.77
yden
-0.76
otype
-0.75
brance
-0.74
swer
-0.74
asure
-0.73
elsen
-0.71
ussen
-0.71
communication
-0.71
POSITIVE LOGITS
OUT
0.80
Hur
0.77
IELD
0.76
arthy
0.76
ASH
0.75
hur
0.74
eed
0.73
Comment
0.73
Ùħ
0.72
UFF
0.70
Activations Density 0.020%