INDEX
Explanations
names of political figures
specific geographical or cultural identifiers and names
New Auto-Interp
Negative Logits
envy
-0.61
OPLE
-0.55
kittens
-0.55
FACE
-0.54
staking
-0.53
condem
-0.53
lished
-0.53
notch
-0.52
readiness
-0.50
puzz
-0.50
POSITIVE LOGITS
oli
0.76
ema
0.75
ak
0.75
ich
0.74
oz
0.74
am
0.73
rad
0.72
ana
0.72
aj
0.71
ar
0.71
Activations Density 0.319%