INDEX
Explanations
words related to strong emotional reactions or impactful moments
New Auto-Interp
Negative Logits
hips
-0.84
hift
-0.75
Stephenson
-0.71
eton
-0.70
mith
-0.68
Naz
-0.67
Avalon
-0.66
Shogun
-0.65
manship
-0.65
Standing
-0.64
POSITIVE LOGITS
ierrez
1.23
ted
1.14
ters
1.11
tering
1.06
tered
1.04
ting
1.01
osc
0.95
warts
0.93
microbiota
0.92
rition
0.92
Activations Density 0.017%