INDEX
Explanations
the word "Nam" at varying activation levels
references to specific names, particularly related to individuals and places
New Auto-Interp
Negative Logits
Introduced
-0.81
Progressive
-0.74
Interview
-0.72
UID
-0.67
Subtle
-0.65
ECB
-0.64
subtitle
-0.62
sample
-0.62
ACTED
-0.61
Millenn
-0.61
POSITIVE LOGITS
nam
1.67
borgh
1.05
ukong
1.03
ned
1.03
orously
1.01
emi
0.95
emn
0.95
ovember
0.93
icol
0.93
rish
0.91
Activations Density 0.008%