INDEX
Explanations
words related to senses like taste and physical sensations
words related to taste and sensory experiences
New Auto-Interp
Negative Logits
wired
-0.71
ogue
-0.67
ucle
-0.64
istor
-0.61
uclear
-0.60
WER
-0.59
ocally
-0.59
ising
-0.58
ignorance
-0.58
ãĥ«
-0.58
POSITIVE LOGITS
hetics
0.87
ript
0.87
lest
0.87
lers
0.85
ean
0.80
roxy
0.80
olicy
0.79
otle
0.78
nces
0.77
akespeare
0.75
Activations Density 0.095%