INDEX
Explanations
names of individuals
proper nouns, particularly names of individuals, associated with significant events or actions
New Auto-Interp
Negative Logits
tered
-0.83
izont
-0.81
rine
-0.80
ration
-0.80
ional
-0.80
urity
-0.77
ihad
-0.76
pees
-0.75
raq
-0.74
oded
-0.73
POSITIVE LOGITS
giving
0.82
mares
0.78
bye
0.72
Newman
0.68
ening
0.68
ships
0.66
fixme
0.65
bourg
0.65
stadt
0.64
Safety
0.64
Activations Density 0.052%