INDEX
Explanations
names of specific individuals
names of individuals involved in news or entertainment contexts
New Auto-Interp
Negative Logits
ceptor
-0.82
conservancy
-0.73
req
-0.72
Ŀ
-0.72
opal
-0.70
nesota
-0.68
respond
-0.67
science
-0.67
acting
-0.66
cephal
-0.66
POSITIVE LOGITS
Lambert
1.19
Lerner
0.90
oyd
0.71
igue
0.70
ucci
0.68
enstein
0.68
ichick
0.67
rikes
0.66
Dull
0.66
DRAG
0.66
Activations Density 0.013%