INDEX
Explanations
the word "ite" with varying activation values
New Auto-Interp
Negative Logits
ington
-0.91
INGTON
-0.87
ood
-0.85
nut
-0.82
nuts
-0.81
wards
-0.80
noon
-0.77
SIGN
-0.75
loo
-0.74
NCT
-0.73
POSITIVE LOGITS
chnology
1.26
lli
1.23
geist
0.95
llo
0.83
lla
0.83
eer
0.77
gregation
0.75
chn
0.74
Scotia
0.73
xt
0.72
Activations Density 0.045%