INDEX
Explanations
words related to changes in perception or stance
New Auto-Interp
Negative Logits
ngth
-0.89
APH
-0.75
PU
-0.68
Brach
-0.66
ccoli
-0.66
staking
-0.64
gel
-0.63
Barron
-0.63
cade
-0.63
Consumer
-0.61
POSITIVE LOGITS
tack
0.98
tune
0.91
wording
0.91
direction
0.88
course
0.88
diapers
0.86
name
0.83
diaper
0.83
complexion
0.82
allegiance
0.81
Activations Density 0.097%