INDEX
Explanations
proper nouns and specific names of people, locations, or titles
New Auto-Interp
Negative Logits
emis
-0.82
cycles
-0.72
asures
-0.71
izont
-0.68
ciples
-0.68
boards
-0.67
hips
-0.67
ocks
-0.66
pots
-0.66
perse
-0.65
POSITIVE LOGITS
pesky
1.15
fateful
1.01
same
1.00
cher
0.99
kind
0.94
translates
0.87
sort
0.86
mattered
0.83
particular
0.83
elusive
0.82
Activations Density 1.614%