INDEX
Explanations
mentions of high-ranking or significant figures, places, or entities
instances of the word "arch" in various contexts
New Auto-Interp
Negative Logits
Ô
-0.92
arettes
-0.85
uana
-0.84
leneck
-0.74
ORTS
-0.71
ña
-0.71
zzle
-0.71
ABE
-0.64
asley
-0.64
ACTION
-0.64
POSITIVE LOGITS
bishop
1.01
arch
0.98
ipel
0.96
rival
0.81
itect
0.78
ival
0.78
nem
0.76
hum
0.75
ivist
0.75
di
0.75
Activations Density 0.005%