INDEX
Explanations
instances of Spanish text with varying activations, especially related to organizational structures, political figures, and specific locations
New Auto-Interp
Negative Logits
velt
-0.50
stride
-0.50
pedia
-0.50
tide
-0.50
tires
-0.50
signalling
-0.49
Shell
-0.48
kered
-0.48
tire
-0.48
locom
-0.47
POSITIVE LOGITS
BIL
0.59
gage
0.57
Ĭ±
0.55
ellar
0.53
gmail
0.53
phia
0.53
ruction
0.52
ogyn
0.51
atl
0.51
aternity
0.50
Activations Density 17.332%